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Editorial 

Eureka 64 


D espite various organizational issues, which delayed the date 
of publication by a few months, it has been a pleasure to edit 
Eureka. It is a rewarding experience, not just for me, but 
also the entire editing team. 

In July 2015, space probe New Horizons became the hrst spacecratt 
to fly by Pluto, with detailed measurements and observations made. 
This is a great discovery of our understanding to the Solar System. 
This resembles the meaning of ‘Eureka - ‘I have found it!’. We pub- 
lish this magazine with the spirit of exploration, hoping that this can 
be an inspiration of further discoveries. 

We continue the usual practice of publishing mathematical articles 
in various Aelds, such as Bounded Gaps between Primes , Quantum 
Mechanics in the Sky and Puzzles , Prisoners and Probability , so that 
readers who are interested in any part of mathematics will find 
something interesting. 


Editor 

Long Tin Chon (Trinity Hall) 

Assistant Editors 

Sam Burr (St Cothorine's) 

Adom Connolly (Christ's) 
Johnny Nicholson (Emmonuel) 
Morio Tong (StJohn's) 

Nicholos Wong (Jesus) 

Subscriptions Manager 

Mike Wong (Clore) 


This year, the publication of Qarch, our problems journal, has been 
resumed. I would like to thank Leo Lai, the Qarch Editor, and his 
publication team, for their great effort in making Qarch another 
success. 


It is my privilege to work with a wonderful editorial team, whom 
I would like to thank for all of their hard work. I would also like 
to thank former editor Jasper Bird and Yanitsa Pehova for their in- 
valuable advice and tremendous support, as well as our writers, our 
sponsors, the Archimedeans, and our readers. Without you none of 
this would have been possible. Happy reading! 


Long Tin Chan 
December, 2015 
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Tne Archimedeans 

DaochenWang, President 2014 - 2015 


2014-15 has been a great year for the Archime- 
deans, a year which marked 80 years since our 
society’s founding. 

In a spirit of celebration, we started our year 
by co-hosting the inaugural Maxwell Lecture 
with the Cambridge University Physics Soci- 
ety. The evening lecture, delivered by Profes- 
sor Gerard t’Hooft on his recent proposal of 
a classical treatment of quantum laws, was 
enthusiastically received by a record-breaking 
audience of over 500. On the next morning 
was our Freshers Talk given by Professor Tim 
Gowers following which the society welcomed 
nearly 100 Freshers as new Life Members in 
the Small Examinations Room. No expenses 
were spared to ensure an abundance of pizza. 

Going into Michaelmas term proper we host- 
ed talks on a wide ranging variety of subjects. 
These went from mathematical biology to ge- 
ometry to string theory. Personal highlights 
included “The Geometry of Speech” by Pro- 
fessor John Ashton where we saw entire lan- 
guages being converted into manifolds and 
“The Mystery of Spinors” by Professor Michael 

The Committee 2014-2015 

President 

Doochen Wong (SidneySussex) 

Vice-President 

EmilyBoin (Emmonuel) 

Corporate Officer 

Adom Goucher (Trinity) 

External Secretary 

Andrew Yiu (Christ's) 

Internal Secretary 

lvon Loh (St Edmund's) 


Atiyah who never ceases to amaze all of us 
with his ability to explain apparently compli- 
cated concepts with so much clarity and ease. 

Our energy and momentum did not let go in 
Lent. Thanks to Leo Lai, the term began by our 
relaunching the dormant QARCH problems 
which asks, among other things, for a proof of 
a disguised Riemann Hypothesis. The major- 
ity of the talks this term focused on combina- 
torics and the most memorable moment must 
be during Professor Imre Leader’s tutorial on 
“Eating and Racing” when we the audience lit- 
erally “laughed out loud” in response to the 
end of a proof using a method called strategy 
stealing. As tradition dictates, we also hosted 
the Annual Problems Drive and the Annual 
Dinner towards the end of Lent. Both events 
were capably organised by Andrew Yiu. 

It has been a great honour to be involved with 
the Archimedeans. I thank all of this year’s 
committee for their hard work and I am sure 
Andrew and his team for 2015-16 will do an 
even better job as we go into our 81st year. 


Treasurer 

Sreyo Soho (Murroy Edwords) 

Events Managers 

Doon von de Weem (Elomerton) 

Publicity Officer 

Eiki Norizuki (Jesus) 

Webmaster 

Joe Tomkinson (Trinity) 
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Using Asymptotic 
Methods to 
lnvestigate the Noise 
Generated by 
Aeroplanes 

Dr. Lorna Ayton 

Junior Research Fellow, DAMTP 


Aeroacoustics is a branch of fluid dynamics 
which studies the noise generated by flow inter- 
actions with aerodynamic bodies. Arguably the 
most dominant studies focus on the noise gener- 
ated by aeroengines, like those found on modern 
passenger aeroplanes, because, let's face it, theyre 
pretty darn noisy! Aeroengines are complicated 
machines, with four key sources of noise, one 
of which is generated by the interaction of the 
blades within the engine with the air flow through 
the engine, and this is the noise discussed here. 

The flow impinging on an initial layer of blades 
is assumed to be steady and uniform. The blades 
rotate and generate unsteady vorticity in the flow 
downstream, which goes on to interact with a sec- 
ond layer of blades and this generates noise. We 
study this noise and hope to relate it to parame- 
ters such as the blade geometry, the Mach number 
of the background steady flow, and the frequency 
of the unsteady vorticity, with the hope that this 
will allow us to hnd an optimal set of param- 
eters that will reduce the overall unwanted noise, 
but not hinder the performance of the engine. 

The noise of specihc interest is that propagating 
away from the engine, because unless youre on- 
board the aeroplane, youTl only hear the noise 
from far away. A common approach to hnding the 
noise is to ask a computer for help, however, be- 
cause we want to compute the far-field noise, we 
need a very large computational domain. Further 
complications arise because high frequency noise 
is a key contributor. To retain accuracy as the fre- 
quency is increased, the grid resolution must also 
be increased. Overall this results in codes which 
have long runtimes, or lose accuracy at high fre- 
quencies. Analysing the effects of altering, for ex- 
ample, the blade geometry, would therefore take 
a very long time (assuming your code is accu- 
rate), and moreover would only illustrate a trend 
in behaviour as you alter the geometry, rather 
than yield a functional relationship between the 
noise generated and the geometry of the blades. 


We therefore analytically model the far-field noise 
generated by so-called “blade-blade” interactions 
to obtain an asymptotic solution which depends 
on all the parameters of the problem. The solution 
allows us to quickly and efhciently discover the 
effects of altering any parameter on the far-field 
noise. Thankfully, a differential equation describ- 
ing the interaction of unsteady vorticity with a 
thin aerofoil already exists [2], although its not 
particularly pleasant: 

+ + 2 + k 2 (w 2 + 5 2 )h) 

_(j±mL e ^_ lk§h) = eSei n 

The coordinates, (<£, 0 ), are dehned as the ve- 
locity potential and the streamhmction of the 
background steady flow around the aerofoil 
(so that the solid aerofoil surface is dehned as 
0 = 0, (j) G [0, (j) e \ , on which a zero nor- 
mal flow boundary condition is imposed), 
and the function h describes the sound gener- 
ated by the interaction. There are various con- 
stants, M, /L, y> w, d, which allow us to con- 
sider different background flow conditions. 

The constant, e <C 1, measures the relative size 
of the aerofoil thickness and camber to its length, 
whilst the constant, k 1, is the (reduced) fre- 
quency of the incident vortical disturbance. The 
function q((p, 0 ) depends on the exact geometry 
of the aerofoil, including effects of thickness, 
camber and angle of attack to the background 
steady flow. The source terms, S(<p, 0 ) and 
0(<p, 0 ), describe the vortical disturbance. 



Figure 1: The model “blade-blade” interaction 
problem; rotating rotors create unsteady wakes in 
the steady background flow which interact with 
the stationary stators. 
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If the aerofoil is reduced to a flat plate (e = 0) 
we have only the Helmholtz equation to solve, 
however for any realistic geometry, all terms 
contribute. To analytically solve the equation 
we make use of the large and small parameters, 
k and , to construct an asymptotic solution, 
h = h 0 (<f), 'ip) + eVkh±((p, 'ip) + 0(e, /c _1 ) 

where e, k~ x « ey/k « 1. This process is illus- 
trated by the following example. 

Example 

Consider the differential equation 

ey"(x) + (1 + 2 e)y'(x) + 2 y(x) = 0 

with boundary conditions, 2/(0) =0 and 
y( 1) = 1, where e « 1. If x = 0(1), then 
y"(x),y'(x) = 0(y), and the largest terms 
in the equation are y and 2 y which we suppose 
to be 0(1) because of the boundary condition 
2/(1) = 1. If x = O(e), then y"(x) = 0(e~ 2 y) 
and the largest terms in the equation dxey”(x) and 
y(x) both of which are 0(e J y). We see that for dif- 
ferent scalings of x we obtain different dominant 
terms in the differential equation. To solve this 
we therefore construct two solutions, one which 
is valid for x = 0(1) and one when x = 0(e). 

For x = 0(1), we propose a solution 
V = yo(x) + yi(x ) + 0(e 2 ) 
Substituting this into the equation and equating 
0(1) terms gives y' 0 (x) + 2y 0 (x) = 0, and the 
boundary condition is yo(l) = 1. The solution is 
2/o = e 2 ^~ x \ Considering the 0(e) terms, we 
require y((x) + 2yi(x) = -y' 0 '(x) - 2y' 0 (x) , 

with boundary condition :>/i (1) = 0 (since 
1/(1) = 1 is satished at hrst order with y 0 ). This 
gives solutiony^ = 0, soy(x) ~ e 2(i-a;) 0(e 2 ) 

For x = 0(e), we change variables, y -+Y, and 
x -+ X = x/e. The equation becomes 
Y"(X) + (l + 2e)Y'(X) + 2eY(X) =0 , 
with boundary condition Y(0) = 0. We suppose 
Y (X) = Yo(X) + eYl (X) + 0(e 2 ) ; and solve for 

7 0 1 similarly, giving 

Y(X) » A( 1 - e~ x ) + e[B( 1 - e~ x ) - 2AX] 

The constants A and B are unknown constants of 
integration; one arises from the second order dif- 
ferential equation for Y 0 and one for Y v 


To determine these constants and hnish the 
solution we use the principle of matched asymp- 
totic expansions [3], which essentially says that 
as X becomes very large in Y, we must have the 
same solution as when x becomes very small in 
y. I.e. as we break out of the x = O(e) region from 
the Y solution by making X large, we should get 
the same answer as when we break into the region 
from the other side by taking x to be small in the 
y solution. 

For small x, 

y(x) ~ e 2 (l — 2x + 0(x 2 )) 

and for large X, 

Y(X) « A - 2AeX + eB + 0(e 2 ) 



Figure 2: Comparisson of the approximate solu- 
tions, y and Y, against the actual solution. 


Substituting X = x/e into this second expansion 
and equating the coefhcients of x and £ gives A = 
e 2 and B = 0. This completes the solution for Y(X). 

Figure 2 shows the two solutions, y and 
Y, plotted against the actual solution to the 
original differential equation, when e = 0.05. 
We see that the approximations give very good 
agreement to the actual solution except in a 
small region near to x = e. As long as we are 
only interested in the solution away from this 
erroneous region, we could use our ap- 
proximation rather than the actual solution. 
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outer region 
(ii) 



Figure 3: Asymptotic regions around an aerofoil required to obtain the far-field, “outer”, solution. 
The coordinates in the inner regions (i) and (iv) scale as O^/c' 1 ), in the transition regions (iii) and 
(v) scale as 0(k' m ) and in the outer region (ii) scale as 0(1). 


For the aeroacoustic question, we are only inter- 
ested in the solution far away from the aerofoil. 
We obtain that solution by using the principle of 
matched asymptotic expansions for all of the re- 
gions illustrated in Figure 3. An example solution 
is plotted in Figure 4, which shows a polar plot of 
the magnitude of the far-field acoustic pressure as 
a function of observer angle. The aerofoil is locat- 
ed at the origin. We see a decrease in noise gener- 
ated upstream (the left half plane) as we increase 
thickness, but altering 

thickness has much less of an effect downstream 
(the right half plane). The aerofoils used in Figure 
4 have a small amount of camber, which results in 
the asymmetric plot - the noise generated above 
the aerofoil is not the same as that below, and we 
can see that at certain angles below the aerofoil 
(LHP) the zero thickness aerofoil generates less 
noise than those with thickness, however above 
the aerofoil, the noise decreases with thickness. 

Further details of the method and solution 
to the aerofoil noise generation problem can be 
found in [1]. 



Figure 4: Far-field acoustic pressure magnitude 
generated by a vortical interaction in steady flow 
with a cambered NACA 4-digit aerofoil, for vary- 
ing observer angle, 0. The legend denotes the per- 
centage thickness of the aerofoil compared to its 
length. 
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Quantum Mechanics 
in the Sky 

Dr. Daniel Baumann 

Reoder in Theoretical Physics, DAMTP 


I magine you look at the night sky. You ran- 
domly select a patch of the sky only a fraction 
of the size of the full moon. To the naked eye it 
will lookpitch black. This doesnt mean that 
there is no light coming from that region. It just 
means that less than seven photons per second 
are entering your eyes, in which case your brain 
cannot form an image. A camera, however, 
can collect photons over a long period of time. 
This is what the camera on the Hubble Space 
Telescope (HST) has done. Instead of looking for 
just an instant, the HST collected light for 
more than 12 days. The result is one of the most 
stunning astronomical images ever produced: 
see fig. 1. Every object in this picture is an entire 
galaxy! We see a few thousand of them, each 
containing billions of stars. This becomes even 
more remarkable if we remember that this tiny 
patch of the sky was selected at random. Any oth- 
er randomly selected region in the sky would 
look essentially the same. From this we can esti- 
mate that the observable universe contains some 
trillion galaxies and a few billion trillion stars. In 
this essay, I will describe our best answer to 
the question: Where did it all come from? 

The Cosmic Microwave 
Background 

Let us start 380,000 years after the Big Bang. The 
universe had just cooled enough for the first 
atoms to form. From this moment on, light was 
able to propagate freely. Today, 13.8 billion years 
later, we observe this afterglow of the Big Bang as 
the cosmic microwave background (CMB). 

The fact that the intensity of the CMB varies 
across the sky (see fig. 2) means that the matter 
in the early universe wasn't distributed uniformly. 
Over time, and under the inAuence of gravity, 
these matter Auctuations grew. Regions of space 



Figure 1: The Hubble Ultra Deep Tield. The image 
contains thousands ofgalaxies in an area that is 
only a fraction ofthe size ofthe moon. 


that contained more than an average amount of 
matter, accreted matter from their surroundings 
and therefore got denser. Eventually, the local 
density became high enough for galaxies, stars 
and planets to form. This part of the story is 
well-understood; what needs explaining is where 
the small seed Auctuations came from. 

On closer inspection, we realise that there is a 
problem with the map shown in fig. 2. The 
Auctuations in the map arent just noise, but dis- 
play non-random correlations across large re- 
gions of the sky. On the other hand, the universe 
was very young when the CMB was created. Even 
signals travelling at the speed of light wouldn't 
have gotten very far. In particular, regions sepa- 
rated by more than the size of the white circles 
in fig. 2 didn't have enough time to communicate 
with each other. We say that they were out of caus- 
al contact when the background radiation 
was created. Having anything but random noise 
over large patches of the sky therefore seems to 
violate causality. 
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The Causality Problem 

It was this causality problem that Alan Guth was 
thinking about in the night of December 6, 1979. 
Guth is now a professor at MIT, but at the time 
he was a researcher at the Stanford Linear Ac- 
celerator Centre, struggling to hnd a permanent 
job. Having been trained in particle physics, Guth 
knew very little about cosmology. However, a year 
earlier he had learned about the conceptual prob- 
lems of the standard Big Bang theory from a talk 
by Robert Dicke. In the night of December 6th, 
he was led to thinking about it again. What hap- 
pened next has become history. 

In a flash of insight, Guth understood that the 
problems of the standard cosmology would be 
resolved if the early universe went through a brief 
period of superluminal expansion. He imagined 
that, for a fraction of a second, space expanded 
nearly exponentially. 

Points that were initially very close to each other 
got stretched apart at an enormous rate, faster 
than the speed of light. According to the theory 
of cosmic injlation , everything we see in the sky 
today was initially compressed into a tiny, caus- 
ally-connected, region of space. Faster-than-light 
expansion of space then blows up such a micro- 
scopic region to enormous size and gives the illu- 
sion of a causality problem. In reality, the causal- 
ity problem would just be an artefact of assuming 
that the expansion of space never exceeded the 
speed of light. 


Although inflation solves the causality problem, 
for explaining the origin of structure it seems to 
be a disaster. Any seed Auctuations that might 
have existed before inhation are spread apart at 
such an enormous rate that nothing survives af- 
ter inflation. InHation seems to make the universe 
empty and featureless. This is clearly not the uni- 
verse we live in. Fortunately, quantum mechanics 
comes to the rescue. 

QM to the rescue! 

In quantum mechanics, empty space is never 
completely empty, as this would violate the 
Heisenberg uncertainty principle. Instead, even 
the vacuum needs to Auctuate: energy and parti- 
cles can appear and disappear spontaneously (see 
fig. 3). These quantum vacuum Auctuations have 
many important consequences for physics. Some- 
times, they have observable effects like the Lamb 
shijt and the Casimir force. The Lamb shift refers 
to a shift in the energy levels of hydrogen atoms, 
as the electrons in the atoms interact with the 
quantum vacuum. This efiect was measured by 
Willis Lamb in 1947. Shortly after, Hans Bethe cal- 
culated the Lamb shift by showing how the fluc- 
tuations in the vacuum disturb the electron while 
it is revolving around the proton. Besides these 
observable efiects, vacuum fluctuations are also 
crucial for the theoretical consistency of modern 
physics. Quantum field theory and the Standard 
Model of particle physics would make no sense 
without quantum vacuum fluctuations. 



Figure 2: Temperature variations in the cosmic microwave background as observed by the 
Plancksatellite. Red (blue) spots are hotter (colder) than the average temperature, retlecting 
density variations in the primordial plasma. The white circles indicate the maximal size ofre- 
gions that could have exchanged signals before the time that the CMB was created (accord- 
ing to the standard Big Bang theory). The points A and B naively were out ofcausal contact. 
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There is therefore no doubt that quantum vacuum 
Auctuations really exist. 

Under ordinary circumstances, vacuum Auctua- 
tions have tiny effects. However, during indation 
they become dramatically important. Because of 
random quantum Auctuations, the inhationary 
energy density in some parts of the universe was 
slightly higher than average. In these regions, in- 
Aation lasted a bit longer, while in other regions 
inflation terminated earlier. This difference in the 
evolution amplies the Auctuations, resulting in 
signicant density Auctuations after inflation. 

The exponential expansion of space during infla- 
tion stretches the vacuum fluctuations to enor- 
mous scales. Small-scale correlations in the quan- 
tum vacuum can therefore leave imprints in the 
large-scale density fluctuations. Moreover, the 
combination of inflation and quantum mechan- 
ics doesn't just make the apparently acausal cor- 
relations in the CMB possible, it also predicts the 
specihc form that these correlations should take. 
Roughly speaking, inflation predicts the probabil- 
ity P(6) that two points in the sky separated by an 
angle 6 are both hotter (or colder) than average. 
The function P(6) is expected to have a character- 
istic shape, i.e. the amount of correlation varies in 
a specic way as a function of angular separation. 
Amazingly, in recent years, it has become possible 
to measure the fluctuations in the CMB accurately 
enough to determine the function P(6) and com- 
pare it to the prediction from inflation. The stun- 
ning result is shown in fig. 4. 

What next? 

So, are we done? Do we understand everything? 



Figure 3: Quantum tluctuations in quantum chromo- 
dynamics (QCD). The energy density associated with the 
gluon tields tluctuates spontaneously. 

Far from it. While fig. 4 is compelling evidence for 
inflation, it doesnt yet exclude alternative expla- 
nations for the origin of structure. 

Moreover, the physics of inflation remains myste- 
rious. We still don't know what caused the burst 
of inflationary expansion. In fact, the require- 
ments for successful inflation are rather dramatic. 
The expansion has to be superluminal and must 
increase the size of the universe by more than a 
factor of 10 26 in just 10' 33 seconds! This can hap- 
pen in General Relativity if the universe is hlled 
with a source of negative pressure. However, or- 
dinary matter doesn't behave in this way, so it is 
likely that inflation requires more exotic physics. 
We hope that future observations will us hints 
for what this physics might be. In the meantime, 
theorists like myself are working hard to discover 
mechanisms that could explain the inflationary 
epoch or find alternatives. At stake is nothing less 
than a complete understanding of the origin of all 
structure in the universe! 


P(0) 
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Figure 4: Comparison ofthe latest measurements ofthe fluctuations in the CMB (bluepoints) with the 
prediction from inflation (red curve). 
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Introduction: 

A right angled triangle with sides as positive inte- 
gers is also known as Pythagorean triangle. We 
will denote the length of the small side by s, the 
length of the large side by l and the hypotenuse 
length by h of a Pythagorean triangle. As is well 
known (Pythagoras Theorem), [ 1 ]. 

s 2 + l 2 = h 2 (1) 

Instead of working with Pythagorean triangles 
geometrically or working with equations like (1), 
it is more convenient to work with triples (s, /, h ), 
known as Pythagorean triples. 

It is well known that by using Euclids formula 
(stated below) one can generate inhnite num- 
ber of Pythagorean triples (but not all of them). 
However generating sequences of Pythagorean 
triples help exhibit interesting relations within 
and among triples besides (hopefully) raising 
mathematical curiosity for the reader in establish- 
ing links between sequences of triples with other 
well known mathematical structures as shown in 
O’ Shea [1] and [2]. 

To refresh reader s memory, we state Euclid^s For- 
mula: For any two positive integers m and n, with 
m > n, a Pythagorean triple is formed by 2mn, 
m 2 - n 2 , m 2 + n 2 . Obviously m 2 + n 2 represents 
the hypotenuse length, the large (small) side is 
represented by larger (smaller) of the remaining 
two terms. 

0’Sheas presentation is restricted to one sequence 
of Pythagorean triples, namely (3,4, 5), (5,12,13), 
(7, 24, 25), (9, 40, 41), (11, 60, 61), (13, 84, 85), ... 
and each triple in the sequence is such that 
h-l= 1. 

In our derivations we also restrict to one sequence 
of Pythagorean triples, generated in four ditTerent 
ways using Lucas sequence. In our sequence of 
Pythagorean triples h - l increases as components 
of triples increase in magnitude. 

Lucas sequence is specihed by letting L x = 1, L 2 
= 3, and L n = L n _ x + L n _ 2 for n = 3, 4, 5, ... In the 
lengthy form it is: 1, 3, 4, 7, 11, 18, 29, 47, 76, 123, 
199, 322, 521, ... 

Sometimes we will use symbols L x , I 2 , I 3 , I 4 , ... 
and identify the above string of numbers, also 


known as Lucas numbers, with symbols 

Observation 1: Using any subsequence of nat- 
ural numbers (not just Lucas sequence) and con- 
structing fractions of the form of ( c/d + d/c)/2 or 
( c/d - d/c)/2 with c and d positive integers and 
c > d to generate a sequence of Pythagorean tri- 
ples will always end up with components of triples 
satisfying the Eulicds formula. An illustration of 
this observation is in Section (A) below. 

Generating a Sequence of Pythagorean Tri- 
ples in Different Ways. 

Section (A): Select subsequences of size two us- 
ing Lucas sequence, starting with number 3 as fol- 
lows. So we have 3, 4; 4,7; 7,11; 11, 18; and so 
on. Then using each subsequence, construct frac- 
tions [4/3 + 3/4]/2; [7/4 + 4/7]/2; [11/7 + 7/11] 
/2 and so on After simplifying each product of 
fractions into a form a/b , 

(i) The numerator is the hypotenuse length. (ii) 
The denominator is the large leg length. 

(iii) the small leg length is computed as follows: 
suppose the sum of two fractions within 
the brackets is c/d + d/c with c > d. Then the small 
leg length is given by (c 2 - d 2 ) 

[4/3 + 3/4]/2 = 25/24 -+ (4 2 -3 2 ,24,25) -+ (7,24,25) 

[7/4+4/7]/2 = 65/56 -+ (7 2 -4 2 , 56,65) -+(33, 56,65) 

[11/7 + 7/11]/2 = 170/154 

-+ (ll 2 - 7 2 , 154, 170) -+ (72, 154, 170) 

[18/11 + ll/18]*(l/2) = 445/396 
-+ (18 2 -ll 2 , 396, 445) -+ (203, 396, 445) 

Note that in each of the above four triples, 
h = l + (c - d) 2 

The n th term, for n = 1,2, 3,... is given by 
[Ln+i/L n +i+L n+x !L n+ 2 ] /2 - [( L n+2 ) 2 + (L n+X ) 2 ] / [2L n+X 
L n + 2 ] * ((I„ +2 ) 2 - (L n+1 ) 2 , 2L n+1 L n+2> ( L n+2 ) 2 + 

(L n+ 1) 2 ) 

We see that the nth triple turns out to be the same 
as given by the Euclids formula. 
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The reader may easily verify that the nth triple 
also satisfy the condition that h = l + ( L n+1 - L n+l ) 2 . 

Section (B): Select subsequences of size three 
from the Lucas sequence, starting with number 1 
as follows. So we have: 1, 3, 4; 3, 4, 7; 4, 7, 11; 7, 
11,18; and so on. Then do the following with each 
subsequence: 

(i) Multiply the hrst number by the sum of the 
middle and the last number. This gives the length 
of the small leg. (ii) Double the product of the 
middle number with the last number. This gives 
the length of the large leg. (iii) The hypotenuse 
length is the large leg length plus the square of the 
hrst term of the subsequence. 

1, 3, 4 -»lx(3+4), 2x(3x4), 2x(3x4)+l4 (7, 24, 
25) 

3, 4, 7 + 3x(4+7), 2x(4x7), 2x(4x7)+3 2 
+ (33, 56, 65) 

4, 7, 11 -+ 4x(7+ll), 2x(7xll), 2x(7xll)+4 2 
+ (72, 154, 170) 


(iii) The square of the hrst number in each subse- 
quence plus the large leg length is the hypotenuse 
length. 

1,3,7 + lx7,3x(l + 7),3x(l+7) + l 2 
-+ (7,24,25) 

3, 4, 11 + 3 x 11, 4 x (3 + 11), 4 x (3 + 11) + 3 2 
-> (33,56,65) 

4, 7, 18 -> 4 x 18, 7 x (4 + 18), 7 x (4 + 18) + 4 2 
+ (72,154,170) 

7, 11, 29 + 7 x 29, 11 x (7 + 29), 11 x (7 + 29) + 
7 2 -> (203, 396, 445) and so on 

The n th term, for n = 1, 2, 3, ..., is given by 
L n y L n+ 1, L n+3 

"* (L n L n + 3> L n+1 (L n + L n+3 ), L n+l (L n + L n +3 ) + L n ) 

( 2 ) 

Recall that by dehnition of Lucas sequence, L n = 
L n + 2 -4+i and 4+3= 4+2 +4+i- ReplacingL n 
and L n+3 in the Pythagorean triple in (2), we have 


7,11,18 -+ 7x(ll+18), 2x(llxl8), 2x(llxl8)+ 
7 2 -+ (203, 396, 445) and so on 

The n th term, for n = 1, 2, 3, ... is given by 

4» 4+i» 4+2"* (4 (4+1+4+2)» 2 4 +i4+2» 

24+i4+ 2 +4 ) 

Now by dehnition of Lucas sequence, L n+1 = 4+1 
+ L n or L n = L n+2 - 4 +1 . So we may write the n th 
triple as 


Section (D): Select subsequences of four con- 
secutive Lucas numbers starting with 1 as shown 
below and perform the following operations with 
each subsequence: 


4» L n+ 1 , 4+2 

■* ((4+2 — 4+i)(4+2+4+i)> 2 4+24+i» 2 4+24+i + 
(4+2-4+1) 2 ) 

+ ( 4 + 2 2 - 4+4 2 4+24+i» 4+2 2 + 4+ 1 2 ) 

Section (C): Select subsequences of size three, 
skipping every third number in the Lucas se- 
quence as follows. So we have 1, 3, 7; 3,4,11; 4. 7, 
18; 7,11, 29; and so on Next do the following for 
each subsequence: 

(i) Multiply the hrst and the last numbers. This 
gives the length of the small leg. 

(ii) Multiply the middle number by the sum of 
hrst and the last numbers. This gives the length 
of large leg. 


(i) Multiply the end numbers of each subse- 
quence. This gives the small leg length. 

(ii) Double the product of two middle numbers. 
This is the large leg length. 

(iii) The hypotenuse length is the large leg length 
plus square of the hrst number in each subse- 
quence. 

1, 3, 4, 7 -+ 1 x 7, 2 (3 x 4), 2 (3 x 4) + l 2 

-+ (7, 24, 25) 

3, 4, 7, 11 ■> 3x 11, 2 (4 x 7), 2 (4 x 7) + 3 2 

-+ (33, 56, 65) 

4,7,11,18 -+ 4x18,2(7x11), 2(7xll)+4 2 

-+ (72,154,170) 
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7, 11, 18, 29 7 x 29, 2 (11 x 18), 2 (11 x 18) + 

7 2 -> (203, 396, 445) 

The n th term, for n- 1,2, 3,... is given by 
-/- / H>L n+1 ,L n+ 2,L n+3 -^(L n L n+3 ,2L n+1 L n+2 ,2L n+1 L n+ 2+L n ) 
(3) Since L n+3 = L n+1 + L n+1 and L n = L n+1 
- L n+l , the triple in (3) reduces to L n , L n+V L n+2 , 

L n +3 

(L n + 2 2 ~L n+1 \ 2L n+1 L n+2 , L n+2 2 +L n+1 2 ), as shown 
above. 

Observation 2: We note an interesting happen- 
ing in the generated sequence of Pythagorean tri- 
ples. The sum of the lengths of the three sides of 
each Pythagorean triangle is equal to the length 
of the large leg of the next triangle. For instance 
7 + 24 + 25 = 56, 33 + 56 + 65 = 154 and so on. 
This is true in general, as we show below. Sum of 
the three sides of the n th Pythagorean triangle is 


2s 3 + 10 = 2(72) + 10 = 154 = / 3 , 

2s 4 - 10 = 2(203) - 10= 396 = / 4 and so on. 

In general one may conjecture, that in the n th tri- 
ple, 2s n ± 10 = / n where one adds 10 if n is odd and 
subtract 10 if n is even. The conjecture is correct. 
A proof may be outlined as follows: One needs to 
show that 2 (L n +2 2 - L n +1 2 ) ± 10 = 2 L n +1 L n +2 which 
is equivalent to (using some basic Lucas sequence 
identities) showing that L n+l (L n +i L n ) L n - 5 
which can be shown to hold using mathematical 
induction. 

So the components of triples of the generated Py- 
thagorean sequence are interestingly tied together 
in more than one way as pointed out in Observa- 
tions 2 and 3. 


L n L n +3 + 2L n+1 L n+ ^ +2L n+1 L n+2 +L n 2 
2L m , iL m+ 9 + L M , 9 + L 


as shown above, 


— 2L n+ ^ + 2L n+1 L n+2 — 2L n+2 (L n+2 +L n+1 ) 

The (n+l) th triple is (L n+1 L n+4 , 2L n+2 L n+3 , 

2L n+2 L n+3 + L n+1 ) 

The large leg length in the (n +1 ) th triple is 2L n+2 L n+3 
= 2L n+2 (L n+2 +L n+1 ) 


Observation 3 Another interesting situation is 
the following. Let (s n , / n , h n ) be the n th Pythagore- 
an triple in the generated Pythagorean sequence 
of triples. We note that 


2si + 10 = 2(7) + 10 = 24 = l ly 
2s 2 - 10 = 2(33) - 10 = 56 = / 2 , 


Observation 4 As seen in the hrst four triples 
of the generated Pythagorean sequence of triples, 
each hypotenuse length is a multiple of 5. This 
is no surprise since it is given by L n +2 2 + L n+1 for 
n = 1, 2, 3,. . . and the sum of squares of any two 
consecutive numbers of the Lucas sequence is a 
multiple of 5. 
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In May 2013, Yitang Zhang stunned the math- 
ematical world by proving the following result, 
which we shall loosely refer to as “bounded gaps 
between primes”. 

Theorem 1 (Zhang) 

There is an absolute constant H such that there are 
innitely many pairs of distinct primes diering by 
at most H. 

The celebrated twin prime conjecture asserts 
that one may take H - 2. Zhang obtained H = 
70000000, but subsequent feverish activity by a 
massively collaborative Polymath project reduced 
this to 4680. In late 2013, James Maynard and 
Terry Tao found a much simpler proof of Zhangs 
result giving H = 600, and a further Polymath pro- 
ject based on this work has, at the time of writing, 
reduced H to 246. 

Something I wish to emphasise in this article is 
that Zhangs result should be thought of as the 
culmination of ideas about prime numbers devel- 
oped by many of the great analytic number theo- 
rists of the 20th century. It melds two important 
bodies of work: 

1. Ideas of Goldston, Pintz and Yldrm, building 
on work of Selberg and others, establishing a link 
between the distribution of primes in arithmetic 
progressions and small gaps between primes; 

2. Ideas of Bombieri, Fouvry, Friedlander and 
Iwaniec, building on but going well beyond work 
of Bombieri-Vinogradov, pinning down strong 
results about how primes are distributed in arith- 
metic progressions. 

Primes in progressions 

The distribution of primes in progressions is cen- 
tral to the whole story, so let us begin with a brief 
tour of that subject. We begin by recalling the 
prime number theorem, which states that7r(V), 
the number of primes less than or equal to X, is 
roughly X / log X • The fact that log X —I 00, 
which means the primes have density tending to 
zero, of course explains why Zhangs theorem is 
not at all obvious: the average gap between primes 
less than X is about log X. One might also remark 
that Zhang’s theorem is not at all surprising , either, 
since a random set of X/ log X integers less than 
X will, for X large, have many pairs spaced by at 
most 2. In fact, it will have many pairs spaced 


by at most 1, something not true for the primes 
themselves of course - to model the primes by a 
random set one has to be more careful and take 
account, for example, of the fact that most primes 
are odd. It is convenient to state the prime num- 
ber theorem a little dierently by introducing the 
von Mangoldt function A(n), which is basically 
dehned to equal log n if n is a prime 1 . Then the 
prime number theorem is equivalent to saying 

that y: ~ * 

How many primes x satisfy some additional con- 
gruence condition x = c (mod d) ? Clearly 
(apart from for very small primes x<d) we must 
have c coprime to d, but there is no obvious addi- 
tional restriction. In fact, one expects the primes 
to be equally distributed amongst the^(d)residue 
classes coprime to d. If this is the case, we have 


^2 

x<X,x=c (mod d) 


X 


When this is true for a given value of d and for all 
c coprime to d then we shall say that the primes 
are nicely distributed 2 modulo d. Proving this 
statement is another matter. Think of X tending 
to inhnity: then one would like to know that the 
primes are nicely distributed modulo d for all d 
up to some limit, which it would be nice to take 
as large as possible. This is, however, only known 
when d is less than a power of log X (in fact d 
can be less than any power of log X, a statement 
known as the Siegel-Walfisz theorem). 


It turns out that proving that the primes are nicely 
distributed for d up to about X 1 ' 2 is equivalent 
to the Generalised Riemann Hypothesis, so one 
should not expect too much progress soon. Re- 
markably, one can do far better if one is prepared 
to know that the primes are nicely distributed only 
for 3 almost all d. The classic result in this vein is 
the Bombieri-Vinogradov theorem, which asserts 
that the primes are nicely distributed modulo d 
for almost all d<X 1/2 . The Bombieri-Vinogradov 
theorem is often described as a kind of Riemann 
Hypothesis on Ayera ge. 

1 And log p if n = p k is a prime power. 

2 Of course, this is only an informal definition and 
not a rigorous one because we have not elaborated 
upon the meaning of «. 

3 Once again, this is an informal dehnition. To make 
it rigorous, one would need to elaborate upon the 
meaning of almost all as well as the « notation from 
earlier. 
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It is suspected that even more is true, a con- 
jecture known as the Elliott-Halberstam Con- 
jecture. There is a diherent Elliott-Halberstam 
conjecture EH(6) for each value of 6 < 1, and 
these conjectures get stronger as 6 increases. 
Here is a rough statement: 

Conjecture 2 

(Elliott Halberstam conjecture EH(0)) 

The primes are nicely distributed modulo d for 
most values of d up to X e . Currently, we cannot 
prove this statement for any value of 6 > 1/2. 

GPYand BFFI 

Goldston, Pintz and Yildirim (henceforth re- 
ferred to as GPY) established a remarkable link 
between the problem of hnding bounded gaps 
between primes and the Elliott-Halberstam 
conjecture. 

Theorem 3 (GPY, 2005) 

Suppose the Elliott-Halberstam conjecture 
EH(6) holds for some value of 6 > 1/2. That 
is, suppose the primes are nicely distributed 
modulo d for most values of d up to X 9 . Then 
we have bounded gaps between primes. 

One should also say that GPY uncondition- 
ally proved some results about gaps between 
primes far superior to any that had appeared 
before their work, showing for example that 
there are always pairs of primes of size around 
X and separated by about ^\ogX , far less 
than the average spacing of log X for primes 
of this size. 

About 20 years before that, in the 1980s, deep 
work of Bombieri, Fouvry, Friedlander and Iwan- 
iec (in various combinations) had shown that a 
certain weak variant of the Elliott-Halberstam 
conjecture is true for some 6 > 1/2. In fact they 
obtained 6 = A/7 in one of their results. Unfor- 
tunately, their result came with some technical 
restrictions which meant that it could apparently 
not be combined with the GPY method to prove 
bounded gaps between primes 4 . (As an aside, 
one of the main ingredients in these works of 
BEEI were certain c omplicated estimates for 
4 There is a technical discussion of exactly why not 
in the book Opera Cribro by Friedlander and Iwan- 
iec, pages 408-409. 


sums of Kloosterman sums due to Deshouill- 
ers and Iwaniec, and coming from the analytic 
theory of automorphic forms; some people 
apparently refer to this era as Kloostermania.) 

What Zhang succeeded in doing is modifying, in 
a quite nontrivial way, both the GPY method and 
the BFFI ideas so that they meet in the middle. 
As it turns out, an equivalent modihcation of the 
GPY method had already been published by Mo- 
tohashi and Pintz. They observed that in the El- 
liott Halberstam conjecture EH(6) one does not 
need the primes to be nicely distributed modulo 
d for most d up to X 9 . Rather one only needs 5 
this to be so for most smooth values of d , that 
is to say values of d with no prime factors big- 
ger than X 6 for some very small S. In particular, 
one does not need to know anything at all about 
the case when d is prime or almost prime. Thus 
Motohashi and Pintz, and independently Zhang, 
established a result of the following form. 

Theorem 4 (Motohashi-Pintz, Zhang) 

Suppose that the primes are satisfactorily distrib- 
uted modulo d for most smooth values of d up 
to X 6 , for some value of 6 > 1/2. Then we have 
bounded gaps between primes. 

We could, if we wanted, call the fact that the 
primes are satisfactorily distributed modulo d for 
most smooth values of d up to X e the weak Elliott- 
Halberstam conjecture, and denote it EH weak (6). 
To reiterate, then, Motohashi-Pintz-Zhang prove 
that if we have EH weak (6) for any 6 > 1/2 then we 
still get bounded gaps between primes. 

Incidentally, now might be a good time to remark 
that the bound for the gap H depends on how 
close 6 is to 1/2, the relation being very roughly 
of the form H ~ (6- l/2)~ 3/2 with a suitably opti- 
mised version of the argument, based on work of 
Conrey, Farkas, Pintz and Rev'esz. 

The heart of Zhangs advance is the following re- 
sult. 


5 In fact one can get away with a still weaker prop- 
erty, in which one need only understand the num- 

ber of primes congruent to c mod d for c varying in 
a small set of residue classes modulo d which varies 
in a multiplicative fashion with d. 
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Theorem 5 (Zhang) 

We have the weak Elliott-Halberstam conjecture 
EH weak (6) for 6=1/2 + 1/1168. That is, the primes 
are satisfactorily distributed modulo d for most 
smooth values of d up to X e . Hence, we have 
bounded gaps between primes. 

We now turn to a few more details. First we say 
something about the GPY method (we shant 
say anything here about its modihcation due to 
Motohashi-Pintz and Zhang). This uses ideas 
related to the Selberg sieve. Then, we shall say a 
few words about the very technically demand- 
ing proof of Theorem 5. The key words here are 
bilinear forms, Kloosterman sums and deep 
bounds of Bombieri and Birch coming from 
Delignes proof of the Riemann hypothesis over 
hnite helds. There is no use of automorphic 
form bounds in Zhangs argument, and this is 
where he deviates somewhat from many of the 
papers of BFFI (though there are closely related 
ideas in a paper of Friedlander and Iwaniec on 
the divisor function). 

The GPY method 

What has the distribution of primes modulo d got 
to do with hnding small gaps between primes? 
Exposing this hitherto unseen connection was 
the remarkable advance of Goldston, Pintz and 
Yildirim. 

GPY in fact prove a result that is strictly stronger 
than bounded gaps between primes. We say that a 
k -tuple of integers/ii, • • • , h k is admissible if there 
is no obvious “congruence” or “local” reason why 
n + hi,--- , n + h k cannot all be prime for inhnitely 
many n. For example, {h 1} h 2 ,h 3 } = {0,2,4} is not ad- 
missible, because at least one of these numbers is di- 
visible by 3, whereas {h ly h 2 ,h 3 } = {0,2,6} is admissible 
(though no-one has the slightest idea how to prove 
that n,n+2,n+6 are all prime for inhnitely many n). A 
moment s thought convinces one that the natural cri- 
terion for admissibility is that, for each prime p, the 
set{hi, • • • , h k } omits at least one residue class modu- 
lo p. If n>, is this class then n + h 3 , • • • ,n + h k could 
perhaps all be prime whenn = —n* (mod p), be- 
cause none of these numbers is divisible byp. Here 
is the stronger statement that GPY proved. 


Theorem 6 

Suppose we have the Elliott-Halberstam conjec- 
ture EH(6) for some 6 > 1/2 , that is to say the 
primes are nicely distributed modulo d for most 
values of d up to X 6 . Suppose that k>k 0 (6) is 
sufhciently large. Then for any admissible k-tuple 
hi, • • • , h k , there are inhnitely many n for which 
atleast two of n + h h --- ,n + h k areprime. 

The modihcation of Motohashi-Pintz-Zhang is to 
show that this is still true if we instead assume just 
the weak Elliott-Halberstam conjecture EH weak (6), 
but we shall not be saying anyting further on 
the subject of this modihcation here. To see why 
Theorem 6 implies bounded gaps between primes, 
one need only note that there are admissible k- 
tuples for every k. Indeed the set of all prime 
numbers between M and 2M will be admissible 
for all sufhciently large M, and the number of 
primes in this range grows without bound by 
the prime number theorem or in fact by weaker 
statements. It should be pointed out that hnding 
tightly packed admissible k tuples, which is neces- 
sary to elucidate the relationship between k and 
the prime gap H , brings one into contact with 
some thorny unsolved problems. To a large extent 
one must rely on computations to optimise this 
dependence for any particular value of k. 

We’ 11 sketch the very broad outline of the proof 
of Theorem 6. It uses sieve theory, the branch of 
analytic number theory that has ultimately grown 
out of a serious study of the Sieve of Eratosthenes. 
Something that has been learned, rather painfully, 
over the last 100 years is that 
Almost primes are much easier to deal with than 
primes. 

An r-almost prime is a product of at most r primes. 
Fix an admissible tuple{/H, ■ ■ • ,h k }. The rough 
idea of GPY is to choose an appropriate value of 
r > k and try to compute the expected number 
of n + hi, * • • ,n + h k that are prime, when n is 
selected at random from those n for which the 
product (n + hi) • • • (n + h k ) is an r-almost prime. If 
we can show that this is > 1 then, for some value 
of n, there must be two primes amongst the n+h { . 

To be a little more formal about this, write v(n) 
= 1 if (n + h 1 )...(n + h k ) is an almost prime and 
0 otherwise. Let X be an arbitrary large quantity. 
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Then what we are interested in is the ratio 

Ex<n<2x( A ( n + M) + -(A(rc + h k ))y(n) 

^ X Y,X<n<1X V ( n ) 

(Recall that A is basically the characteristic func- 
tion of the primes weighted by log.) If we can 
show that this is > 1, we will then know that for 
some n6[X,2X) at least two of n + • • • , n + h k 

are prime, and this will conclude the proof of 
Theorem 6. To elaborate this idea, one must be 
able to estimate the numerator and denomina- 
tor of (1). Now we come to a completely crucial 
idea, invented by Selberg. The idea is that there 
are weight functions v(n) which behave morally 
rather like the characteristic function of the al- 
most primes (or the set of n for which 
(n + /ii) • • • (n + h k )is almost-prime), but which 
are much easier to compute with. Let D, 1 < D < 
X, be a parameter and consider the function 

« = ( Y Ad ^ 2 ’ 

d\n,d<D 

where at the moment(A^)i<^<x)is any set of 
real numbers with X x = 1. The weight v(n) is al- 
ways nonnegative, and furthermore if n is prime 
and between D and X then v(n) = 1. The reason 
is that in this case, the only divisors d of n are 1 
and n, and of these only d = 1 satishes d<D. For 
this reason weTl call v a majorant for the primes. 

Lets see how we can use a majorant like this to 
study a classical problem called the Brun-Titch- 
marsh problem: that of estimating from above 
the number of primes in a range [X 0 , X 0 +X). Now 
provided that 6 D = o(X 1/2 ), W e can compute an 
asymptotic for the average value of v(n) over 
X 0 <n<X 0 + X and hence get an upper 
bound for the number of primes in this range. To 
see why the conditionD = o(X 1/2 )is critical, we 
need to do an actual calculation (though a very 
short one): 

E *+) = E (E x < 

X 0 <n<X 0 +X Xo<.n<Xo-\-X d\n,d<.D 

= E x < E 1 ( 2 ) 

d,d'<D X 0 <n<X 0 +X-,d,d'\n 

Now the inner sum counts how many n there are 
in the rangeX 0 < n < X 0 + Xfor which both 
d and d 0 divide n, or equivalently for which the 
lowest common multiple [d,d 0 \ divides n. Note 
that [d, do\ < D 2 . Henceif D = o(X 1/2 )then [d, d 0 \ 

6 This should be read as “a bit smaller than X 1/2 ”. In 
fact, one would require a condition like D < X 1/2 ~ £ 
for some e > 0. 


= o(X) and the answer is essentiallyX/[d, d 0 \ 
(Imagine you were asked how many multiples 
of 2014 there are in the interval [10 10 ,10 10 +10 5 ]: 
since 10 5 /2014 = 49.65... its either 49 or 50, but in 
either case 49.65 is a good approximation.) 
Therefore 

E ■'<»>» E (3) 

X 0 <n<X 0 +X d,d'<D 1 7 J 


If, however, D > X 1/2 , then we could have (in- 
deed we will often have) [d, d 0 \ > X, so the an- 
swer might be 0, or it might be 1. (How many 
multiples of 2014 are there in the interval 
[10 10 ,10 10 +10 2 ]?) We cannot say which without 
carefully inspecting X 0 , and getting a usable ex- 
pression is impossible without further ideas. Let 
us say that X i/2 is the sieving limit for this prob- 
lem, and we call D the sieving level. Note that 
the larger we can take D, the more flexibility we 
have in choosing the weights X d . This ought to 
lead to the majorant v being a better approxim- 
ant to the characteristic function of the primes 
themselves. Miraculously, even though we are 
forced to takeD = o(X 1/2 )there are choices of 
the weights X d for which v is a reasonably good 
approximation to the characteristic function of 
the primes. We can find such X d by minimis- 
ing the quadratic form in (3) subject to A 2 = 1, 
a routine exercise albeit one requiring some fa- 
cility with Mobius inversion. When this is done, 
we find that in fact, for this choice of weights, 


Y « 


2X 


logX 

x 0 <x<x 0 +x & 

When X 0 = 0 the majorant v only overestimates 
the number of primes by a factor of 2. If one 
looks very carefully at v(n) then one sees that it 
resembles a sort of characteristic hmction of al- 
most primes, though it is best not to pursue this 
line of thought too far, leaving it perhaps as mo- 
tivation for calculating somewhat blindly with 
v. Note in particular that v will not in general 
be {0,l}-valued. Note, by the way, that we have 
proven that the number of primes in [X 0 , X 0 +X\ 
is at most about 2XAogX for any X 0 , but that is 
another story. 


It turns out that even if D is a very small power of 
X we still get a majorant v that overestimates the 
primes by a constant factor, although this con- 
stant gets worse as D becomes smaller. 
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Returning to our main discussion, recall (1). 
In the light of the crash course on the Selberg 
sieve we have just given, it is natural to consider 
dehning 

«'(") = ( £ (4) 

d\(n+h 1 )...(n+h k ) 

d<D 

for appropriate weights \ d , where D is as big as 
possible. To recap, one should think of v as telling 
us the extent to which(n + h 1 )...(n + h k ) is almost 
prime, though to attach any more precise mean- 
ing to such a statement one must be more precise 
about the nature of the \ d , about which I shall not 
say any more. 

By a small modihcation of the above reasoning, 
one can compute the denominator in (1) pro- 
vided that the sieving level D is o(X m ). What, 
however, of the numerator? It may be split into 
terms of the form 

A(n + h)u(n ) 

n<X 

for h = h p ...,h k . Trying to repeat the computation 
in (2) above, we instead arrive at the expression 

Y ^d!^d 2 Y, ( 5 ) 

di,d 2 <D n<X 

di,d 2 \(n+hi)...(n+hk) 

To understand this sum, we need to know how 
the primes (weighted using the von Mangoldt 
function A) behave modulo d = [d p d 2 7, and this 
quantity may be as large as D 2 . If we know the 
Elliott-Halberstam conjecture EH(6) (primes are 
nicely distributed modulo d for most d up to X 6 ) 
then we will be fine so long as D = o(X 612 ). Un- 
conditionally, that is to say using just the Bomb- 
ieri-Vinogradov theorem, we may only take the 
sieving level D to be about X 114 , which means 
v(n) gives a weaker notion of (n + h 1 )...(n + h k ) 
being almost prime. 

At this point one must dirty the hands by do- 
ing an actual computation of the numera- 
tor and denominator of (1) with a judicious 
choice of the weights \ d . Making a sensible 
(by which we mean more-or-less optimal) 
choice, and taking D to be almost X 6/2 , one 
eventually computes that the ratio in (1) is 

2 e 

(l + fc-l/2)2 

Remember that k is the number of elements in 
our admissible tuple{^i+" I should say 


that I dont think I could motivate the result 
of this computation particularly well, if at all, 
even to an expert audience. I’m not such there 
even is a conceptual explanation of it - you just 
have to do it. 

Recall that we wanted the ratio to be greater 
than 1: this would give us bounded gaps be- 
tween primes. Even if k , the number of ele- 
ments in our admissible tuple, is very large one 
does not achieve this if 6 < 1/2. This is pretty 
unfortunate, since we can only proceed uncon- 
ditionally when 6 < 1/2. As soon as 0 is even a 
tiny bit larger than 1/2, however, the ratio will 
indeed be larger than 1 provided that k is big 
enough, and we will get bounded gaps between 
primes as discussed above. The value 6 = 1/2 
is thus a crucial barrier for the GPY method: 
with 6 < 1/2 one gets very little, whilst with 6 > 
1/2 one obtains bounded gaps between primes. 

This concludes our cursory discussion of the 
GPY method, which links bounded gaps be- 
tween primes to the distribution of primes in 
progressions. We turn now to the other side of 
the story, in which the aim is to understand as 
much about the latter as possible. 

Primes in arithmetic 
progression 

We turn now to a description of Zhangs major 
advance, the proof of Theorem 5. Let us recall 
the statement. 

Theorem 7 (Zhang) 

We have the weak Elliott-Halberstam conjec- 
ture EH weak (6) for 6 = 1/2 + 1/1168. That is, the 
primes are satisfactorily distributed modulo d 
for most smooth values of d up to X 6 . 

We did not say exactly what satisfactorily dis- 
tributed means, but it basically means that we 
are interested in showing that 7 if (c,d) = 1 then 

x=c (mod d) 

7 In fact, this only needs to be shown when c be- 
longs to a specific and fairly small set of residue 
classes varying in a multiplicative fashion with d, 
but will not mention this point again. 
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for most smooth d < X e . Remember that by 
smooth we meant that all prime factors of d 
are at most X 8 for some very small 8. The way 
in which this is used is that it allows us to sup- 
pose that d is welhactorable, which means that 
for any Q,R with QR ~ d we can hnd a factori- 
sation d = qr with q ~ Q and r ~ R. Exactly how 
this property is used is not very easy to explain 
properly, but bear with us. One can, however, 
immediately observe that if q,r are coprime 
then the condition^ = c (mod d)is equiva- 
lent to conditions on x modulo q and modulo 
r, by the Chinese remainder theorem, so there 
is a sense in which we are reducing the size of 
the moduli and thereby making the problem 
simpler. 

At this point in the exposition it is slightly 
convenient to work with averages rather than 
sums, which we notate using the probabilistic 
E notation, though there is nothing random in 
our discussion. Our task is more-or-less equiv- 
alent to estimating an average over primes of 
the form ^ 

E x < x A(x)i(>(x) = — ^2 A (x)ip(x) 

x<X 

where in this case 

^(x) 1 x=c (mod d) ^-(x,d)= l/0(^0 (6) 

and the goal is to show that this average is ap- 
preciably smaller than the c< trivial” bound of 
about 1 which comes from the prime number 
theorem. 

To attack averages like this, we introduce a no- 
tion that is central to additive prime number 
theory: that of expanding in terms of bilinear 
forms. Suppose that instead of the above aver- 
age we were instead asked to estimate 

E X < x (a * P)(x)i/j(x) ( 7 ) 

wherea,/3are arithmetic functions with 
\a(m )\, \(3(n)\ < 1, and with a(m) support- 
ed on the rangem ~ Mand/3(n)on the 
rangen ~ N, where MN = X. Here, * de- 
notes Dirichlet convolution, that is to say 

(a*P)(x)= a(m)/3(n ) 

mn=x 

The function pj is completely arbitrary for the 
purposes of this discussion, except we assume 
that \^{ x ) I < 1 for all x. We even let ^ be 


complex-valued. The sum (7) can more-or-less 
be rewritten as 

E m ^ M ^n~Na(m)l3(n)ilj(mn) 

Applying the Cauchy-Schwarz inequality, the 
square of this is bounded by 

E m ^M \'E n ^ N P(n)'tp(mn)\ 2 _ 

= E ntn >^ N /3(n)/3(n')E m ^M'>P(rnn)^(mn') 

Applying Cauchy-Schwarz a second time, the 
square of this is bounded by 

En 1 n'~Jvt I m'~M^(™)^(mn , )^(rn , n)^(mV) (8) 

The key thing to note here is that the unspecihed 
hmctions», /3 have completely disappeared, and 
we are left staring at an expression involving only 
'ip. Perhaps we might hope to estimate it, the 
aim being to improve substantially on the trivial 
bound of 1. If we can do this, we have a kind of 
“certihcate” which asserts that^always gives non- 
trivial cancellation in averages such as (7), no 
matter what a and /3 are. 

There are two rather obvious barriers to this 
observation being at all usehil, and they are the 
following. 

(i) We in fact wish to estimate the average 

E x < x A(x)tp(x) 

but we have said nothing about the extent to 
which A can be expressed in terms of Dirichlet 
convolutions ol * /3, nor even offered any 
motivation for why this should be expected. 

(ii) We are interested in a specihc hmction f, 
given by (6). Why should this v|/ allow us to pro- 
vide a certihcate by exhibiting nontrivial cancel- 
lation in (8)? 

With regard to (i), many readers will know that 
A = (i * log , where \i is the Mobius hmction. 
However, this turns out not to help greatly in the 
above scheme. The reason is that in practice we 
will only be able to estimate expressions such 
as (8) for quite restricted ranges of M and N, 
usually with M and N close in size, and with 
the decomposition fi * log there is no 
opportunity to seriously restrict these ranges. 

A successhil technique depends on a remarkable 
identity of Linnik, or rather on a kind of trun- 
cated variant of it due to Heath-Brown, which we 
shall not state. The identity states that 
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A (n) 
logn 


V (-i) fc 

^ k 


T 'k ( n ) 


where r\(n) is the number of ways to factor n 
= n 1 ...n k with n { > 1 for all i. If one knows the 
dehnition and very basic properties of the ( func- 
tion, the proof is just a couple of lines long: ob- 
serve that 

oo 

logC(s) = iog(i + (C(s) - i)) = - i) fc 

k= 1 

and compare coethcients of n~ s on both sides. 
The right hand side is pretty obviously the right- 
hand side of Linniks identity. As for the left-hand 
side, its derivative is C'( s )/C( s )> which is well- 
known to be -S A ( n ) n “ s . Indeed, it is precisely 
this relation that links primes and the (-function. 
Integrating with respect to s, we obtain Linniks 
identity. 


Note that r\(n) is a Dirichlet convolution of k 
copies of the function which equals 0 at 1 and 
is 1 everywhere else. Chopping the domain of 
this function into various ranges, we can indeed 
write A as a sum of a number (not too large) of 
convolutionsa * /3, with a(m) supported where 
m ~ M and /3(n) where n ~ N , with consid- 
erable flexibility in arranging the ranges M and N 
we need to worry about. The precise arrangement 
of these ranges is a rather technical matter, and 
suthce it to say that Zhang classiAes them into 
three diAerent types (plus a somewhat trivial 
type): these are called Type I, II and III. Actually, 
the Type III sums in fact involve certain 4-fold 
convolutions Oi\ * Oi% * (T 3 * OL/\. 


What about (ii), that is to say the issue of obtain- 
ing a “certiAcate” for ^ which certiAes that aver- 
agessuchas E x <x(ot * /3)(x)ip(x) 

exhibit cancellation? One may note that ( 8 ) is cer- 
tainly not always o(l). Rather trivially, when ^ is 
the constant function 1 we get no cancellation. 
The same is true if \i>(x)\ = 1 and if V 7 is mul- 
tiplicative in the sense that <t>(mn) = <j>(m)<j)(n), 
as can be easily checked. (One consequence of 
this observation is that the whole scheme we 


have just outlined is somewhat unhelpful for 
showing that A does not correlate with a single 


Dirichlet character y.) However, our function 


ljj(x) \ x = c ( m od d) 


'~(x,d )=1 


does not obviously exhibit multiplicative behav- 


iour and therefore one can hope to produce a 
certiAcate for this at least on average over d. 


The reader will not be surprised to hear that the 
above discussion was an oversimplification, al- 
though it captures something of the key ideas. 
There are other types of “certiAcate” than (8). The 
key tools that are brought to bear on estimating 
averages such as ~E x < x (a * P)(x)3j(x) with 
our particular choice of ^ are: 

(i) Cauchy-Schwarz, similar to the above; 

(ii) Fourier expansion, for example of 

(iii) Shitting the range of summation, that is to 
say replacing the E n ^ N F(n) by 
E nr ^ N E\ k \< K F(n + k) , which should be 
roughly equal to it if K < N; 

(iv) Certain changes of variable and substitutions; 

(v) Completion of sums, that is to say replacing 
an incomplete sum A x )by sums x ^/ d J^ ed ^ 
where/ c Z/dZis an interval, and e d (x) := e 2mx / d m 

Considerable extra flexibility is available under 
the “well-factorable” assumption that d = qr ; in- 
stead of averaging over d < X e , one now 
averages over both q and r, and this affords still 
more opportunity to vary the application of the 
above four ingredients. The application of (i) to 
(v) above (in various combinations) throws up 
other sums that need to be estimated. The most 
interesting such case for Zhang occurs in his 
treatment of the so-called Type III sums, where 
expressions such as the sum 

E ed{ jk + {frry +hin+h2n ' ){9) 

n,n',l (mod d) 

are relevant. Here, hph 2 >k are parameters, and 
the sum over n,n\l is restricted somewhat, in 
particular to n,n,l,l+k 3 0 so that it makes sense. 

It is very hard to explain how an expression 
like this comes up without going through the 
applications of each of the techniques (i) to 
(v) (which all occur here) in turn. SuAice it 
to say that the k comes from shitting as in 
(iii), the h p h 2 come from (v) (followed by an 
application of Cauchy-Schwarz) and the 
instances of c/n ultimately come from an initial 
Fourier expansion of Lx=c (mod d ) where x = mn. 

Now it turns out that (9) is in fact bounded 
by (essentially) d 3/2 , apart from in certain 
degenerate cases This is about as good an 
estimate as one should hope for, since it 
represents essentially square-root cancellation, 
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as the number of things being summed over is d 3 . 
This is by no means a trivial fact, and depends on 
cohomological ideas of Deligne, and in particu- 
lar the Riemann hypothesis over hnite helds. 
When d= p is prime, the sum (9) may perhaps 
be written more suggestively, to an algebraic 
geometer, as 

e p{% 1 + X 2 + £3 + X A ) 

(a?!, X2,X3 ,Xi) 6V 

whereR c F*is a variety ax A x 2 + = jxix 2 x s x A 

In other cases (the analysis of the Type I and II 
sums) more standard sums such as Klooster- 
man sums ^ ep ^ 1 + X2 ^come up. This particu- 
lar sum is bounded by 2^/p, a famous bound 
of Weil which does admit an elementary (in 
the technical sense) proof due to Stepanov. 
This is not the case with the Bombieri-Birch 
bound,which needs the whole of the algebro- 
geometric machinery. 

The great majority of these ideas can be found in 
the work of combinations of Bombieri, Fouvry, 
Friedlander and Iwaniec. The sad thing, however, 
is that this excellent and optimal bound of d 3 ' 2 
just fails to cancel out some losses coming from 
other manoeuvres, particularly the completion 
of sums manoeuvre (v) which is rather costly 
if the length of I is much less than d. Instead of 
a valid certihcate for one simply recovers, 
essentially, the trivial bound for (8) - not, perhaps, 
the most spectacular use of the deep machinery. 


making the argument work and establishing 
Zhangs remarkable theorem. 

Polymath 8a 

Shortly after Zhangs paper came out, Terence Tao 
orchestrated a collaborative project to reduce the 
value of H as far as possible from Zhangs value of 
70000000, and more generally to understand all 
aspects of Zhangs work. One of the many great 
things about this project, in my view, was that 
without it there could well have been thou- 
sands of papers improving H in various different 
ways. One big achievement of this project was to 
increase Zhang’s 1/1168 to 7/300, that is to say to 
prove EH weak ( 1/2 + 3/700), and to rehne various 
other aspects of the argument, including the sieve 
theory and the computation determination of 
admissible tuples, so as to eventually reduce H to 4680. 

Another signihcant achievement of the project was 
to remove the dependence on the deepest algebro- 
geometric results, although if this was ones concern 
then the value of H could only be taken to be 14950. 
At this point, the entire argument could reasonably 
be presented from hrst principles in an advanced 
graduate course. (Personally, I found it extraordi- 
nary that initially one could only get bounded gaps 
between primes using the full force of Delignes ma- 
chinery, and no hnite bound without it.) 

Maynard-Tao 


The crucial new innovation of Zhang is to 
exploit the presence of the factorisation d=qr 
(which, remember, can be selected quite flexibly). 
One might imagine that, by the Chinese remain- 
der theorem, one obtains a product of sums 
similar to (9) modulo q and modulo r, leading 
to a bound of (qr) 3/2 and no eventual gain. What 
Zhang miraculously finds, however, is that by 
making sure the shift in (iii) is by a multiple of 
r, the sum modulo r is not of Bombieri-Birch 
type (9), but degenerates to something like 


E 

si,S2,n (mod r) 
si,S2,n^0 


e r { 


S 1 — g 2 x 

sis 2 n ' 


This exhibits better than square root cancellation, 
being of size essentially r, as one can easily check. 
(In fact, this is a slight simpliAcation: Zhang actu- 
ally obtains a Ramanujan sum in which the vari- 
able n is constrained to be coprime to r, but the 
better-than-square-root cancellation still holds). 
Thus instead of (qr) 3/2 Zhang gets instead q 3/2 r , 
and the factor of r 1/2 thus saved is crucial in 


As Polymath 8a was nearing completion, James 
Maynard and Terry Tao simultaneously made a 
dramatic advance which vastly simpliAes the whole 
argument. Students wishing to study bounded gaps 
between primes may now read the 23-page paper of 
Maynard, freely available at 

http://arxiv.org/abs/ 1311.4600 . 

In the Maynard-Tao argument, one still needs in- 
formation about the distribution of primes in pro- 
gressions, but things now work with any positive 
value of 6 > 0: the GPY barrier of 6 = 1/2 turned 
out to be somewhat illusory. The crucial new idea of 
Maynard and Tao is to consider a diAerent weight 
hmction v. 

A second Polymath project, again led by Terence 
Tao, has been working on the problem in the 
light of the Maynard-Tao development. When 
the first version of this paper was submitted on 
20/2/14 the record value of H was 264, but when 
I came to correct some typos a few days later this 
value had already been reduced to H = 246. 
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The elliptic modular function, j, invariant under 
PSL(2, z), has Fourier expansion 

EM 3 


j{q) = 


1 


A (q) 


= Y Cmq " 

m =—1 


= - + 744 + 196884g + 21493760g 2 + ... 

Q 

as z -> z°°, where q = e? niz is the nome for z; 


(1) 


vast subjects encompassed) in which the j -invariant 
and the Leech lattice are central. As we ponder the 
meaning of life, we should be aware of the prescient 
remarks of the author [A], Douglas Adams: 

“The Answer to the Great Question . . . is . . . For- 
ty-two,” said Deep Thought, with inhnite majesty 
and calm. 


E±(z) = 1 + 240 cr 3 (n)q n 

n= 1 

is the theta series for the E 8 lattice, cr 3 (n) = ^d 3 and 

d\n 

oo oo 

Z(q) = qH(l-q n r = Y Tm( i m 

n =1 m= 1 

= q- 24q 2 + 252 q 3 - U72q 4 + 4830 q 5 + ... (2) 

is the modular discriminant [S]. Ihere are two new 
congruences. 
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The vector co = (0, 1, 2, ... , 24,70) lives in the Lo- 
rentzian lattice II 25>1 in 26 dimensions as an isotro- 
pic Weyl vector [C], allowing us to construct the 
Leech lattice as co^/co. Watsons [L, W] unique 

n 

non-trivial solution to X/ 2 = m2 is ( n , m) = (24, 70). 


Indeed, the second authors observation 35 years 
ago that 

196884 = 196883 + 1 (3) 
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1: Describing the noise 

One of the key tasks in Image Processing consists 
in removing interferences in the image coming 
from acquisition problems due, for instance, to 
external factors, like temperature or illumination. 
We have all experienced similar issues when tak- 
ing a picture of a very dark scene, for example in 
a clear and fresh summer night when looking at 
dazzling stars shining in the sky. Even with pro- 
fessional cameras, the result may look grainy, not 
clean nor clear. We say that the image is corrupted 
by what is called noise , the granular component 
we want to remove in order to get a clean version 
of the image. Depending on applications and on 
the physical properties characterising the trans- 
mission, acquisition and processing steps, dif- 
ferent types of noise can be considered. In many 
cases, the noise is assumed to follow a Gaussian 
distribution, Fig. la. This assumption relies, main- 
ly, on a probabilistic asymptotical property of 
noise distributions, called Central Limit Theorem. 
According to this result, the sum of independent 
random variables of any distribution converges, 
as the number of measurements goes to inhnity, 
to a Gaussian-distributed random variable. Very 
often, though, this very reasonable assumption 
does not model realistically the actual physical 
source of noise corrupting the data. For instance, 
in the case when transmission problems ' switch 
off’ just some of the pixels in the image, a differ- 
ent, not everywhere-spread noise distribution is 
preferred: the noise model that arises is called im- 
pulse noise, Fig. lb. Finally, in astronomical Im- 
aging applications, physical properties of the light 
rehecting its quantised nature have to be consid- 
ered. This results in a photon-counting process 


which mathematically relates to a discrete Poisson 
probability distribution, Fig. lc. Differently from 
the additive nature of the Gaussian-type noise, 
Poisson distribution is signal dependent, that is 
the intensity of the noise depends on the bright- 
ness of the regions in the image: brighter regions 
will present higher level of noise. In Fig. 1 we can 
observe some examples of the noise distributions 
described. Many more can be considered: nor- 
mally they model signal-dependent noise distri- 
butions in the form of multiplicative noise (like 
speckle noise, Rician noise. . . ) arising, for in- 
stance, in MRI applications. However, the mathe- 
matical modelling and analysis of these models is 
rather involved, so for what follows we will focus 
on the three noise models (Gaussian, impulse and 
Poisson) mentioned above. 

2: Image denoising 

The task of image denoising can be described 
mathematically as an inverse problem, that is giv- 
en our noisy image/, we want to reconstruct the 
noise-free image u such that: 

/ = T(u) (i) 

In (1) the operator T models the degradation 
process u goes through: in our case this can be 
thought of as the operator that encodes the phys- 
ical properties responsible of the noise present in 
the image. Other choices of T can be made: for 
instance, T can represent a blur operator or di- 
vide regions in the image where the information 
is available from others where the information 
has been lost (like, for example, for dis-occlusion 
problems). 



(a) Gaussian noise. 




(c) Poisson noise. 


Figure 1: Different noise distributions for different Imaging applications. 
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What is important to highlight here is that, in gen- 
eral, problem (1) is not easy to solve: uniqueness 
of the solution and/or its stability with respect to 
the initial data can fail. These problems are nor- 
mally said to be ill-posed. In order to get a solution 
of (1) an alternative formulation has to be consid- 
ered. A traditional approach consists in regular- 
ising the problem by adding some a-priori infor- 
mation describing the properties of the solution 
we are looking for. In our Imaging framework 
the question then is: what are the fundamental 
properties of an image? The answer is: the edg- 
es. Thanks to the edges in the image, objects can 
be identihed from their background and details 
within objects are distinguishable: as such, every 
regularisation procedure used to reformulate (1) 
needs to encode such features. 

2.1 Designing a tailored method 

In order to derive a regularisation model suitable 
for our Imaging tasks, a careful mathematical 
modelling is needed. Images have to be inter- 
preted as functions dehned on an image domain 
(the bi-dimensional grid of pixels) and associated 
to a number or a set of numbers corresponding to 
either the greyscale or the RGB (red, green, blue) 
intensity values, respectively. The modelling of 
images in function spaces is rather a delicate point 
in the design of an optimal reconstruction meth- 
od: the choice of the hmction space itself rehects 
in the expected regularity of the image we want 
to reconstruct. Function spaces which allow too 


much regularity have to be discarded as they will 
destroy the main structures in the image because 
of their strong smoothing properties, see Fig. 2. 
We refer the reader interested in knowing more 
about Image modelling and the choice of suitable 
function spaces for Imaging tasks to [1, 7]. 

Over the last twenty years non-smooth regularis- 
ers have been studied. Namely, Imaging commu- 
nities have focused their attention on Total-Vari- 
ation (TV) regularisation metods. In words, TV 
can be thought of as a regularising term that while 
smoothing out the unwanted noise from the im- 
age, preserves its fundamental and geometrical 
structures, such as edges, compare Fig. (2c). In 
more mathematical words this corresponds to 
consider a weaker norm of the image gradient 
which does not enforce too much regularity, but 
identihes contours and edges in the image. 

A general regularisation method of inverse prob- 
lems like (1) has the form: 

hnd v such that v minimises 

J{v) := R(v) + \(f>if,v) ( 2 ) 

The approach (2) is an example of energy-mi- 
nimisation approach: we model our denoising 
problem by assuming that a scalar quantity /, the 
energy, is associated to our problem. Our solution 
u will be the function in correspondence of which 
the energy / achieves its minimum. 
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Figure 2: The choice of the regularisation term and the hmction space affects the reconstruction: 
smooth regularisations remove noise though destroying fundamental structure; non-smooth 
regularisations (such as Total Yariation) removes the noise, while preserving edges. 




Figure 3: T.L.: Acquired noisy image. T.R.: Denoised image 
with large A. The regularisation is poor and the noise is still 
present in the reconstruction. B.L.: Optimal denoised im- 
age. B.R.: Overregularised denoised image.Fundamental 
structures have been destroyed. 

We model / as the sum of two different terms: 
the regularisation term R(u) which shall encode 
the properties listed above and the hdelity term 
/)which models in a mathematical way the 
statistics of the noise, i.e. its probability distribu- 
tion. The parameter A balances the effect of the 
regularisation against the hdelity term and, heuris- 
tically, can be thought of as a quantity that mea- 
sures how much we trust the acquired data. In the 
following we want to focus on its optimal choice 
which is a fundamental issue in applications where 
normally the right value of this parameter is found 
by a trial-error method. For medical applications, 
the choice of such parameter is crucial. Figure 3 
shows some different TV reconstructions of the 
noisy image of a brain. 

For medical purposes we would not like a poor re- 
construction of our measured noisy image, which 
would not remove noise that could possibly oc- 
clude some regions of interest. On the other hand, 
we would not like regularise our image too much 
and lose meaningful anatomical structures. Some- 
where between these two extremes we look for the 
optimal balance between trust in the data and noise 
regularisation: as mathematicians we want to hnd it 
in a sensible, realistic and automatised way. 

3:The training idea 

Let us stick for a moment to the medical imaging 
framework. In such applications, like MRI for in- 
stance, the accuracy of the measurements can be 
tuned. In general, we can think the accuracy to be 
proportional to the image acquisition time. 


High-acquisition times will result in clean, almost 
noise-free images which will approximate well the 
ground-truth, while low-acquisition times will cor- 
respond to blurry and noisy images. 

In clinical practice, ideally high acquisition times 
are not feasible due to the computational costs 
and the reluctance of patients to stay for too long 
in MRI scanners for examinations. On the other 
hand, standard clinical acquisition times do pro- 
duce some noise in the image, so a pre-processing 
step is fundamental for any subsequent diagnostic 
image-based evaluation and therapy planning. For 
a realistic and tailor-made denoising model we will 
exploit in the following two main ideas: a learning 
approach to make the estimation of the optimal 
balance robust and a correct modelling of the dif- 
ferent noise distributions in order to cover the dif- 
ferent noise models presented in Section 1. 

3.1 Learning the noise via training sets 

The idea of using database of images is quite real- 
istic in applications. In medical Imaging, specially 
designed objects called phantoms are used to anal- 
yse and tune scanning devices. These objects may 
resemble anatomical structures of the human body 
and because of their design they provide consistent 
and reliable results. Based on this, we describe in 
the following our learning idea: 

1. We use fixed devices producing, at each scan, the 
same type of noise in the image, whose level (i.e. 
intensity) is unknown; 

2. A previously-acquired (or simulated) database of 
images, typically of phantoms, is available. We as- 
sume we have two different versions of each image 
u k (k=l,...,N) in the database: a clean, almost noise- 
free version u k acquired with high acquisition times 
and a second version of it f k , acquired within stan- 
dard medical acquisition times and consequently 
corrupted by noise. 

We aim to find the optimal X for the problem (2) 
such that for every k, the TV reconstruction of f k 
matches at best the corresponding noise-free u k 
version of the same image [8]. Because of our 
assumption 1. above, we shall now use the comput- 
ed X as an optimal parameter in order to process a 
new, not-simulated image, such as a real MRI scan 
of the brain of a patient. 
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Figure 4: Sample of 5 images of OASIS MRI brain database: original images (upper row), noisy images 
(middle row) and optimal denoised images (bottom row), X opt = 3280.5. 


In Figure 4 we show an example of a simulated da- 
tabase of MRI scans of brains: the brst two rows 
represent the two noise-free and noisy yersions of 
the images in the database, respectively, whereas 
the third row contains the denoised images ob- 
tained with the optimal parameter \ computed as 
described above. Further work has to be done in or- 
der to improve upon the characteristic ' watercolor 
efFect in the computed reconstruction: this, in fact, 
relates to the properties of the TV regularisation 
term used. More general regularising terms R(u) 
can improve upon this property, like, for instance, 
the Total Generalised Variation, [4]. 

For this example the noise has been assumed to be 
Gaussian-distributed, but in the following we will 
comment briefly on how to incorporate difFerent 
noise models, as discussed in Section 1. From a 
computational point of view, solving non-smooth 
problems of the form (2) is very demanding, es- 
pecially as the number of images in the database 
becomes very large, which is desirable in order 
to make the noise estimation robust. Due to our 
modelling assumptions 1. and 2., though, some 
sampling strategies can be used to improve upon 
efficiency, [6]. 

3.2 Optimal modelling 

The regularisation approach (2) can accommodate 
easily different noise distributions like the ones cor- 
responding to impulse or Poisson noise. Depend- 
ing on the application considered, 


Heuristically, quadratic-type hdelity terms are gen- 
erally considered for Gaussian noise distributions, 
whereas the modulus of the difference between u 
and/is preferred for impulse noise distributions. 
Finally, more sophisticated, logarithmic-type, hdel- 
ity terms are used for Poisson noise distributions 
[3]. When just one single noise distribution is as- 
sumed to corrupt the measurements, the approach 
described above can still be employed, by simply 
using the suitable hdelity term for describing the 
problem. But one can ask the question: what if mul- 
tiple noise distributions are present in the image? 
As explained, each of them can be the result of dif- 
ferent acquisition/transmission problems, so the 
combined presence of noise is perfectly reasonable 
in applications. 

An immediate extension of (2) for the mixed-noise 
case would be considering a model that, for the 
very easy case of two noise distributions, reads as: 

hnd v such that v minimises 

J(v) ±*R(v) + + (3) 

that is a model that describes the joint presence 
of two noise distributions through the sum of the 
corresponding hdelity terms. The same strategy 
described in Section 3.1 would then, heuristically, 
determine the optimal size of the parameters A 2 and 
A 2 which on one side will resemble the data htting 
with respect to the degradation due to each of the 
two noises present in the image and, 
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Figure 5: Noise decomposition through infimal-convolution: 
the mixed-noise distribution is assumed to be a combination of 
Gaussian and impulse noise. 

on the other side, would balance against the size of 
the regularisation, as before. 

More complicated operations can be considered 
in order to solve this task. For instance, applying 
the discrete analogue of the convolution operator 
called infimal-convolution [2, Chapter 12] to 0i 
and 02 one can get from the model the additional 
property of noise decomposition into its compo- 
nents, compare Fig. 5. 

4: Condusions 

The recipe for image denoising requires different 
ingredients. First of all, a careful understanding of 
the physical and statistical properties characteris- 
ing the problem is needed in order to formulate an 
appropriate mathematical model. This reflects in 
the correct choice of the data fidelity term by which 
we can mimic the noise distribution corrupting the 
data. A fundamental aspect then is also the opti- 
mal choice of parameters which will result in an 
optimal image reconstruction. Training the model 
using database of images seems to be a promising 
and reliable strategy to design reliable and efficient 
image denoising methods, [5]. 
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E very once in a while, I hear puzzles about 
100 prisoners and a meticulous, demanding 
warden. All of these puzzles share a com- 
mon characteristic: a set of prisoners must work 
together to devise a clever scheme to thwart the 
warden. I hear these puzzles often enough that 
each time they reappear, I view them with an in- 
creased level of understanding corresponding to 
the stage of my mathematics education. 

Any problem may have a solution, but, some- 
times, that solution may not be the most ethcient 
one possible. For example, suppose you want to 
hnd something in your room but dont remember 
where you put it. You can either search the room 
by yourself. Or you can call your (many) friends 
to help you. From a correctness standpoint, both 
solutions are correct. Youllhnd whatyourelook- 
ing for eventually. But from an algorithmic stand- 
point, the second solution where you search in 
parallel with your friends is better because it has 
a shorter runtime. The same holds for solutions 
to the 100 prisoners puzzle. Some solutions may 
be theoretically correct answers to the puzzle but 
may have expected runtimes that exceed the lifes- 
pan of an average person and are, thus, practically 
undesirable. Now, through this lens, the lens of a 
theoretical computer science student, I would like 
to present to you the 100 prisoners puzzle and its 
variants. 

100 Prisoners and a Light Bulb 

The original, very famous puzzle involving an in- 
terrogation room, a light bulb, and 100 prisoners 
is the following (paraphrased from Wu in [3]): 


One hundred prisoners just arrived in prison. The 
warden tells them that starting tomorrow, each of 
them will be placed in an isolated cell, unable to 
communicate amongst themselves. Each cell has a 
window so the prisoners will be able to count the 
days. Each day, the warden will choose one of the 
prisoners uniformly at random with replacement, 
and place him in a central interrogation room con- 
taining only a light bulb with a toggle switch. The 
light bulb is initially switched off. The prisoner may 
observe the current state of the light bulb. If he 
wishes, he may toggle the light bulb. He also has 
the option of announcing that he believes all pris- 
oners have visited the interrogation room at some 
point in time. If this announcement is true, then all 
prisoners are setfree, but ifit isfalse, all prisoners 
are executed. The warden leaves, and the prisoners 
huddle together to discuss theirfate. Can they agree 
on a strategy that will guarantee their freedom? [3] 

One common solution to the puzzle is to divide 
the days into 100-day blocks and instruct any 
prisoner to toggle the light off if he is interrogated 
twice within the same block. The hrst prisoner of 
each block turns the light on and the last prisoner 
checks whether the light is still on when he enters 
the interrogation room at the end of the 100-day 
block. If the light is still on and he did not enter 
the room on any previous day within the block, he 
declares that all prisoners have visited the interro- 
gation room [3]. This solution is technically cor- 
rect because it guarantees the prisoners their free- 
dom, but the prisoners are expected to be freed 
after 1.072 xl0 44 days, in years « 10 31 times the 
age of the universe. From an algorithmic stand- 
point, this solution is rather poor because it has 
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an expected runtime of 0(n 1/2 e n ) [3] for n prison- 
ers. 

The challenge now is to find a solution that is cor- 
rect and also has an optimal runtime. Such a solu- 
tion is more likely to guarantee that the prisoners 
are freed while they are still alive. 

The canonical (better) solution is to designate a 
“leader” to be the person who counts the number 
of unique prisoners who have been interrogated. 
Tbe leader may do so by counting the number of 
times the light bulb has been switched on. A pris- 
oner who has not yet toggled the light switch will 
turn the light on if it is currently otf. A prisoner 
will do nothing if he enters the room when the 
light is currently on. Tbe leader turns the light off 
each time she leaves the room and increases her 
counter when she sees a light that is on. Thus, after 
counting 99, the leader may declare that all the 
prisoners have been interrogated at least once. (It 
is suthcient to count to 99 because the leader her- 
self counts as the last prisoner.) 

How long are the prisoners expected to wait? 
Suppose that T represents a counter for the num- 
ber of times the bulb has been switched on. We 
may count the expected number of days until T 
= 99. Let X { denote the number of days that pass 
between an increment of the counter from when 
T = i until T = i + 1. Let Y { denote the number 
of days from when a leader turns off a light bulb 
until a prisoner turns on the light. Let Z ? denote 
the number of days from when a prisoner turns 
on the light until the leader enters the room to 
see the newly turned on light bulb. Thus, X~YfvZ { . 
Let X be the number of days the strategy requires 
in total before the prisoners are freed. Given n 
prisoners, the probability of turning on the i th 
light is . The probability that the leader enters 
the room on any day is K By linearity of expecta- 
tion, 

E[X} = E[Xi] = + E[Zi]) 


i =1 
n— 1 

=£ 


n — i 


T + n) = n 2 — n + nH n _\ 


is 10417.J days or approximately 29 years [3]. 
Though still a long time, it is within the prisoners’ 
lifespans. 

Wu [3] further summarized some strategies 
that may lead to even shorter wait times. One 
such strategy achieves an expected runtime of 
0(n(logn) 2 ). The key insight behind this algorithm 
is to allow “assistant” leaders to help the leader 
by doing some of the counting. Then, the leader 
would sum together the totals of all the “assistant” 
counts to determine if all prisoners have visited 
the interrogation room. To do this, we must be 
able to divide up the counting of the light bulbs 
into blocks of days. There must be a block for as- 
sistants to count the number of prisoners and a 
different stage for assistants to tell the leader their 
total [3]. See [3] for more details. 

But can we achieve a solution with an even bet- 
ter runtime, for example, a solution with an O(n) 
expected runtime? Turns out, the answer is no 
for the O(n) runtime solution. It is a common 
joke among CS theoreticians that we hate lower 
bounds because it prevents us from making bet- 
ter algorithms. The reason why we cant create an 
O(n) algorithm for the 100 prisoners problem is 
precisely that a lower bound prevents us from do- 
ing so. The expected number of days for all pris- 
oners to enter the interrogation room at least once 
is O(nlogn), therefore no strategy, no matter how 
clever, may achieve a better expected runtime 
than O(nlogn), [1]. A simple calculation contirms 
this lower bound. Let the random variable X t be 
the number of days until the i-th unique prisoner 
with probability of selection 71 ~ 1 + 1 ispicked. 


e[x] = j 2 e ^ = T2 


n — i + 1 


In asymptotic notation, the “leader” algorithm 
has an expected runtime of 0(n 2 ) days [3]. When 
there are 100 prisoners, the expected wait time 


= n \ - = 0(n log n) 

i= 1 

Naturally, when the original problem has been 
solved, we wonder if the solution still applies for 
variants of the problem. Some of these solutions, 
like the leader and the 0(n(logn) 2 ) solutions, de- 
pend on certain characteristics of the problem 
like the ability to tell time. What if we took away 
these abilities? Below, I present some harder in- 
stances of the 100 Prisoners puzzle and challenge 
you to find more ethcient solutions for them. 
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Variations of the 100 Prisoners 
and a Light Bulb Puzzle 

We assume for Problems 1 and 4 that the following 
are true: All prisoners are allowed to discuss their 
strategy on the hrst day. On the next day, they are 
each placed in an isolated cell with a window. The 
interrogation room contains a single light bulb 
that is initially switched off. 

1. Blue and Red Cells: Each isolated cell is either 
painted completely blue or completely red. 

In addition to declaring that all prisoners 
have been interrogated, a conhdent pris- 
oner must also correctly state the number 

of prisoners in red cells and the number of 
prisoners in blue cells [1]. 

Problem 1 is easily solvable using a strategy simi- 
lar to the “leader” strategy if two light bulbs are in 
the interrogation room instead of one. However, 
with only one light bulb, is it possible to devise an 
0(n 2 ) time algorithm? 

2. Light Bulb May Be Off: We assume that the 
light bulb in the interrogation room may be 
turned on or off initially (i.e. before the hrst 
prisoner enters) [2]. 

If prisoners still have windows in their rooms, 
then the “leader” algorithm still provides an 0(n 2 ) 
solution to Problem 2 because the leader can just 
record all the times the light is on starting from the 
second day. The hrst non-leader prisoner to enter 
the interrogation room must be unique; therefore, 
on the hrst day, he can simply leave the light on if 
it is on or turn it on if it is off. All other prisoners 
behave as before. However, this problem becomes 
trickier if prisoners do not have windows in their 
individual cells because the prisoners have just 
lost their ability to keep track of time. 

3. No Windows: We keep the condition pre- 
sented in Problem 2. Now, prisoners may 
no longer keep track of how much time has 
passed because they are placed in isolated 
cells with no windows and no way to keep 
time [2]. 


This variation is harder because now the leader 
does not know how many days have passed and 
how many prisoners were interrogated before she 
enters the room. She could be the hrst prisoner 
to enter the room and the light bulb could have 
been initially on. In this case, her count of the 
number of interrogated prisoners would be off 
by 1. Can we still achieve an 0(n 2 ) algorithm by 
tweaking the “leader” protocol (the answer is yes 
but how)? The harder question is can we tweak 
the 0(n(logn) 2 ) solution to apply to this problem? 

4. Couple of Prisoners: Let us assume that all 
prisoners arrested were couples. Therefore, 
among the 100 prisoners, there are 50 dis- 
tinct couples (no person may be a member 
of more than one couple). The warden then 
divides each couple. One member of the 
couple is placed in Group A and the other is 
placed in Group B. On each day, the warden 
chooses uniformly at random with replace- 
ment someone in Group A to interrogate in 
the morning. In the afternoon, on the same 
day, the warden chooses randomly someone 
from Group B to interrogate. Couples may 
not switch who theyre partnered with. In 
addition to declaring that all 100 prisoners 
have been interrogated, a prisoner must also 
correctly claim that all couples have been 
interrogated (at least once) on the same day 
[4]. 

There exists a solution that assigns each couple to 
a particular day. The person from Group A may 
only turn the light on when they are called on their 
assigned day. Otherwise, they turn the light off. If 
the person from Group B is also called on their 
assigned day, they will leave the light on if it is on 
from the morning. If the person from Group B is 
called on any other day, they will turn the light off. 
A leader chosen from Group A counts the number 
of unique days she sees a light on when she enters 
the room. This indicates that both members of a 
couple were interrogated on their assigned day 
(the previous day). What is the expected runtime 
of this solution? Does this problem still have a so- 
lution if the prisoners are placed in isolated cells 
without windows? 
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Trading Light BulbsforTime 

For this last problem, I want to see how much 
more power we must give to the prisoners in order 
to bring the expected number of days in jail down 
to O(n). The riddle I created below trivially must 
have an O(n) solution (answer in the appendix). 
Giving prisoners more light bulbs enables them 
to tell each other more information in a shorter 
amount of time. But the more interesting ques- 
tion, now, is can the prisoners escape with less 
than 6 light bulbs? 

5. Prisoners and Vindictive Wardens: The same 
100 prisoners are ushered into prison by 
the same warden. They will be placed in 
isolated cells with windows starting tomor- 
row. Except now, the warden tells them that 
the interrogation room has 6 light bulbs in a 
row, and she will interrogate each prisoner at 
most twice. Prisoners are chosen uniformly 
at random from those that have not yet been 
interrogated twice. The prisoners were tre- 
mendously happy at this news because they 
are guaranteed freedom after at most 200 
days. The warden cackles and tells them that 
there is a catch. This time, when a prisoner 
enters the interrogation room, he is asked, 
“Are you the last unique prisoner?” The last 
unique prisoner must declare, “Yes.” Every 
other prisoner must declare, “No.” Once a 
“Yes” is correctly declared, everyone is imme- 
diately freed. If someone declares incorrectly, 
everyone will be executed. How can they 
guarantee their freedom? 

As a hint, the solution to this problem critically 
depends on the prisoners being able to tell time. 
If the isolated cells do not contain windows, what, 
then, is the minimum number of light bulbs need- 
ed in order to guarantee the prisoners’ freedom? 

Is the Solution Optimal? 

I hope that you will take what I have written here 
to heart so that the next time you look at a puzzle, 
dont just fmd a right solution; hnd the optimal 
solution. 


Appendix 

Answer to “Prisoners and Vindictive Wardens”: 

Despite having 6 light bulbs, the answer to this 
riddle is not as simple as encoding the number 
of prisoners who have been interrogated twice, 
which we could if we had 7 light bulbs. But we 
may use a similar scheme. Let “on” represent 1 
and “off” represent 0, with the rightmost light 
bulb representing the smallest bit. On the i-th day, 
the prisoner who enters the interrogation room 
knows at least [ij unique prisoners must have al- 
ready been interrogated by the pigeonhole prin- 
ciple. Then, using this fact and the light bulbs, we 
may implement a counting system. Let A f be the 
number represented in bits by the 6 light bulbs 
on the i -th day. Every prisoner knows how many 
times he has been interrogated. If the prisoner is 
entering the interrogation room the hrst time, he 
will check whether [lj + A z is 99. If so, then he 
declares, “Yes.” If not, he changes the light bulbs 
such that j^LtlJ + A /+i =[^J + A,- + l and declares “No.” 
If the prisoner is entering the room for the second 
time, he will change the light bulbs such that 
[^J + A i+1 = ||J + A,- and declare “No.” We may see 
this algorithm works for any n prisoners because 
0 < A f < I^J + 1. Never is A f > [|J + 1 because that 
means [| + > n, a contradiction. Furthermore, 

never is A z < 0 because we would contradict the 
pigeonhole principle. For any n prisoners, this 
scheme would work given [iog 2 Q|J + i)J light bulbs. 
May we achieve a better scheme using fewer light 
bulbs? 
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From genes to Chinese 
Restuarants... 

n [6], Warren Ewens introduced what would 
become a foundational tool in the analysis of 
genetic data, a probability distribution that is 
now known as Ewens' Sampling Formula. Tbe 
nearly two decades that had passed since the 
1953 discovery of the double-helix structure of 
DNA had seen the conhrmation the mechanism 
of DNA replication and the discovery of the cod- 
ing relation between DNA, RNA, and proteins - 
in particular the system of triplets of nucleotide 
pairs, called codons , and their relationship to ami- 
no acids - and thus the discovery of the genetic 
code. By the end of the 1960's, new techniques of 
protein sequencing made it possible to confront 
the mathematical models of gene frequencies 
developed by R. A. Fisher, Sewall Wright, J. B. S. 
Haldane, and their successors with data: now, giv- 
en a set of sample of n proteins from a population, 
it was possible to count how many distinct alleles 
appeared, and at what frequencies. This data was 
encoded in the form of an allelic partition , an n- 
tuple of non-negative integers (a 1 ,... y a n ), where a ; 
is the number of distinct alleles that appear ex- 
actlyj times in the sample, so that 

n 

= n ’ 

whereas i=i 

n 

k = dj 

3= 1 

is the number of distinct alleles observed. Some 
examples of allelic partitions, from popula- 
tion geneticists' favorite organism, the fruit fly 
Drosophila , appear in the table below. Thus, for 
example, for Drosophila willistoni , in a sam- 
ple of n - 582 individuals, there were k - 7 al- 
leles, of which the most frequent appeared 559 


times, the next most frequent alleles appeared 
in 11 and 7 individuals, respectively, one al- 
lele was observed in two individuals, and 3 
alleles appeared in exactly one individuals. 

Looking at amino-acid differences (which, due 
to the transcription of proteins from DNA, cor- 
respond to different alleles) in haemoglobin mol- 
ecules across a variety of mammals [9], Mootoo 
Kimura argued that the rate of substitution - the 
rate at which new alleles arising by mutation in a 
single individual replace the ancestral allele across 
the entire population - was much higher than was 
consistent with the neo-Darwinist theory that 
had emerged over the past few decades: Kimuras 
calculations showed that a nucleotide was substi- 
tuted roughly every two years, much faster than 
the once every 300 years predicted by Haldane 
[8] a decade earlier! This discovery led Kimura to 
propose his neutral theory of molecular evolution y 
which argued that "non-Darwinian" forces (the 
randomness in birth and death in small popula- 
tions composed of individuals whose total life- 
time reproductive output is on average equal) as 
opposed to natural selection (some types have a 
higher average total reproduction rate) was the 
primary cause of genetic diversity, sparking a 
Aerce debate that would rage throughout the fol- 
lowing decades. 

Kimura complemented his neutral theory with 
what we now call the injinite alleles model: mu- 
tations happen at approximately a constant rate, 
and given the typical length of genes, e.g. from an 
average 1600 nucleotide pairs per gene for yeast, 
up to an average of 16,600 for mammals [13], the 
probability of seeing the same mutation twice 
is very small. As a first approximation, we can 
thus assume that individuals acquire mutations 
at a constant per capita rate \i y and with every 



38 


Species 

n 

k 

Allelic partiton 

F 

P 

willistoni 

582 

7 

0*559 = 1) on = 1, aj = 1, <22 = 1, ai = 3 

0.932 

0.00690 

tropicalis 

298 

7 

0*234 = 1 } a 52 = l,a± = 2, 02 = 1,02 = 1 

0.6475 

0.130 

equinoxalis 

376 

5 

a 36 i = l,o$ = 1, = 1 , a*3 = 2 

0.9922 

0.0355 


Table 1: Sample allelic partitions, sample heterozygosity, and signihcance values for three species 
of Drosophila, from [15,16] (n.b. unless a value is given, aj = 0). 


new mutation, the mutant individual has new al- 
lelic type that has never appeared previously. To 
complete the description of the model, we need to 
describe the population dynamics: we'll assume 
a population of fixed size N, and assume that 
each individual has a "reproductive clock" that 
rings at constant rate 1. When that clock rings, 
the individual produces an offspring of the same 
type, which replaces another individual chosen 
uniformly at random from the whole popula- 
tion. If this all seems a bit artificial, don't worry: 
although to prove this -- or even be precise about 
the necessary and sufficient conditions -- is a bit 
beyond of the scope of this article, the results 
below continue to hold provided the popula- 
tion is more or less of fixed size, everybody in- 
teracts with everybody, individuals give birth to 
relatively few offspring at any given time, and, es- 
sentially, when no type has a selective advantage. 

Ewens devised his sampling formula as a robust 
statistical test of Kimuras infinite alleles model, 
which, given an allelic partition (a 1 ,...,a n ) in a sam- 
ple of size n , would give the maximum likelihood 
estimate of the total population mutation rate, 
0=N[i, and the probability of seeing that particu- 
lar partition under Kimuras neutral hypothesis: 

Theorem 1 [Ewens' sampling formula] The 
probability of observing the allelic parti- 
tion (a 1 ,...,a n ) in a population of size N that 
evolves according to Kimuras infinite alleles 
model with a mutation rate of \i is 

. n (N 3 

_-_fr (i) 

We'll give a proof of this, but instead of Ewens' 
original, we'll give an easier one that takes advan- 
tage of more recent perspectives and thus makes 
explicit the broad connections of the result (many 
different proofs have been given since Ewens' 
original paper e.g. [4] and [5] give two other ap- 


proaches to the one taken here). In fact, we'll ac- 
tually prove 

Theorem 1' The probability of a sample of 
n individuals containing of k distinct allelic 
types which occur in n { > 0 individuals 
( i=l,...,k, n 1 +...+n k =n) is 

nk k 

0(0 + l)---(0 + n-l)n( ni_1 ) L 

Theorem 1 then follows by a bit of combinato- 
rics: an allelic partition (a 1 ,...,a n ) corresponds to 
the situation where YTj=i a j — k and exactly a ; of 
the values n { are equal to j; all such choices for 
the n { are equally probable (with the value given 
by (2)), so to compute the probability of the al- 
lelic partition, we need only count the number of 
possible sequences, and multiply by (2), which, 
under the assumed allelic partition, becomes 

nk n 

» (t+ 1 )...[« + »-i) n ffi - iBr (3) 

To count the number of ways of partitioning n 
items into groups of size n ly ...,n k , such that there 
are a ; groups of size j, we could start by imagin- 
ing listing the n items, and then taking the first 
n 2 and assigning them to the first group, the next 
n 2 to the second group, etc. There are n! ways of 
forming our initial list, but this over-counts the 
number of partitions: for example, no matter how 
many ways we arrange the first items (which 
can be done in nj ways), they are in the same 
group, and similarly for each of the n { . Thus, we 
should divide n! by Yli=i n i- = TYj=iU ] -) aj to av °id 
this over counting. Further, if, say n { = we could 
swap the corresponding sets of elements from our 
list, and still have a partition into groups of size 
n ly ...,n k . By the same reasoning, if we rearranged 
any of the a ; groups of siz ej, which can be done in 
a ; / ways, we would have the same partition. Thus, 
we must further divide by fI7=i a i ! to get the actual 
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number of partitions: 


n! 

n"=iC? ! )° ia r' 

Multiplying this by (3) gives (1). 


Before turning to the proof of Theorem 1 ', lets 
look at some of the implications (and applica- 
tions) of the sampling formula: hrst of all, let K n be 
the (random) number of allelic types in a sample 
of size n. Looking at (2), we see that the probabil- 
ity of having k allelic types in a sample of size n is 


P (K n = k) = 

where 

S(n,k) := 

{(m,- 


S(n, k)0 k 

6(6 + 1 ) • • • (0 + n — 1 ) ’ 

e n^-D* 

-hn fc =n} i =1 


is a well-known quantity, an unsigned Stirling 
number ofthefirst kind. Named after 18 th century 
Scottish mathematician James Stirling, they count 
the number of distinct permutations of n objects 
with k cycles. Using them, we can partition Ew- 
ens' sampling formula into two components, 


P((ai,... ,a n )) = P (K n = k) P ((a 1? .. .,a n )\K n = k) 

_ _ £(n, k)6 k _ 1 1 r (i) 

6(6 + 1) • • • (6 + n — 1) S(n,k)“ ad 


where only the iirst component depends on 6. In 

particular, we can use the hrst piece as a maxi- 
mum likelihood estimator for 6 that depends only 
on the size of the sample and the number of dis- 
tinct alleles in the sample: 


^logP (tf„ = fc) = !-£ 


9 + i — 1 


so p (K n = k) takes its maximum at the unique 
value of § satisfying 


fc = E 


i =1 


9 

6 + i - l 


By the usual integral comparison argument (see 
appendix), we observe that /c ~ lnn, so that for a 
suhiciently large sample (too large to be very use- 
ful in practice, unfortunately), we can make the 
approximation 9 « ^. 


Note also that 


E [K n ] = 


ELi 

9 + 1 ) • • • (9 + n - 1 ) 


6p(0 + l)---(0 + n-l) 
9(9 + 1) ■ ■ ■ (9 + n — 1) 


= £; 


+ i - 1’ 


so 19 is the value for the (population) mutation 
rate such that the expected number of allelic 
types, E[K r J , is equal to the observed number of 
types, k. 


Given the number of alleles in the sample, we can 
use the latter component, 

'«*■.w 

as a test of neutrality independent of the mutation 
rate [16]: we can do a signihcance test by deter- 
mining the probability of seeing the observed 
allelic partition given the observed number of 
alleles - if it's too low, we can safely reject the neu- 
tral hypothesis. More precisely, one can use the 
Ewens' sampling formula to determine the prob- 
ability P of observing the sample heterozygosity , 



that is, the probability that two individuals, drawn 
uniformly at random with replacement, have the 
same allelic type. For example, in a sample of size 
50 that contains 3 allelic types, the heterozygosity 
can be shown to be always greater than 0.33; (4) 
tells us that the probability that 0.33< F<0.37is 
less than 5% i.e. if we observed a sample with hete- 
rozygosity less than 0.37, we would reject the neu- 
tral model, and assume that selection was at work 
(values from [16]). Table 1 shows values of F and 
P for the Drosophila samples: only for the species 
tropicalis , with P = 0.130 , would we fail to reject 
the neutral hypothesis (when working with data, 
it's important to think like a scientist - we don't 
prove our hypotheses true or fmd counterexam- 
ples, but must content ourselves with rejecting -- 
or failing to reject - them on the basis of statistics). 

Note also, that once we've accepted the neutral 
hypothesis and inferred the value of the popu- 
lation mutation rate, 0 , we have a complete un- 
derstanding of not just the sample, but of the 
relative abundance of all the allelic types in the 
population from which the sample came. In par- 
ticular, the long-run stationary behaviour of the 
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Figure 1: A graphical representation of the inhnite alleles model, with the coalescent process 
superimposed (red). Arrows represent birth/replacement events, whereas one of the "trian- 
gular" species experiences a mutation, causing it to become a new "parallelogram" type. 


population from which the sample came. In par- 
ticular, the long-run stationary behaviour of the 
whole population can be described by a Poisson- 
Dirichlet process with parameter 0, or by the 
closely related GEM (Griffiths-Engen-McClos- 
key) distribution, which gives the proportions of 
each type in decreasing order (the most frequent 
type becomes allele A 2 , the next most frequent A 2 , 
and so-on). A full discussion of these is beyond 
the scope of this article, but we refer the interested 
readertotheextremelygoodpresentationsin [7,5]. 

Now that were (hopefully) convinced that Ewens' 
sampling formula is useful, let's turn to its proof. 
To start, let's describe Kimuras modelin aprecise 
manner: let X { (t) be the number of individuals of 
allelic type A t , i=l,2,..., so 

oo 

j2m) = N, 

i=1 

and assume that (X } (t), X 2 (t), ...) is a continu- 
ous time Markov process, such that in the time 
interval [t,t+ At), we can have two possible types 
of events: either some individual of type i gives 
birth, and its offspring replaces an individual of 
type j (leaving the population size N unchanged), 
or an individual of type i incurs a mutation, which 
causes it to become a new, previously unseen type. 


The transition probabilities for these events are 

P (Xi(t + A t) = Xi(t) + 1, Xj(t + A t) = Xi(t) - 1) 
Xi(t)Xj(t), 


N 


-At + o(At), or 


D (Xi(t + At) = Xi(t) - 1, Xi*(t + At) = 1) 
= fiXi(t)At + o(At), 


where i is the smallest value of i such that no in- 
dividual of type A ? has appeared previously, and 
o(At) represents any quantity such that 


lim 

A40 


o(At) 

At 


= 0 . 


Note that every type i has equal probability of giv- 
ing birth, and each type j has equal chance of be- 
ing replaced, which is what makes this a neutral 
model. 


In Figure 1, we give a graphical representation 
of the inhnite alleles model for a population of 
size N = 6. We imagine lining up the individu- 
als according to some arbitrary (but fixed) order, 
with lines below them on which we will track the 
birth, death, and mutation events. When an in- 
dividual's reproduction clock rings, we draw an 
arrow from their line, pointing at another uni- 
formly randomly chosen line; the individual at 
the tip of the arrow is replaced by the offspring 
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of the individual at the tail of the arrow. When 
a mutation occurs (represented in Figure 1 by 
a small bolt of lightning), the individual at that 
position becomes a new type. Such hgures are 
especially useful, as they indicate how we can 
change our perspective: instead of thinking about 
individuals, we can think about the lines of an- 
cestry that trace back individuals to their par- 
ents, giving us a genealogical tree e.g. in Figure 
1 , the red lines trace the genealogy the sample 
consisting of the three "circle" types (who de- 
scend from a common ancestor at time t = 0) and 
the "parallelogram" (who arises from a mutation 
to a "triangle", but, if we only have access to the 
sampled "parallelogram", we can't infer this, so 
the genealogy stops at the mutation). In popula- 
tion genetics, this ancestral process is called the 
coalescent. It was introduced by J. F. C. Kingman 
in three papers published in 1982 ([10, 12, 11]) 
that revolutionised the mathematical approach 
to population genetics: instead of looking at the 
forward time evolution of the entire popula- 
tion, one could trace back the ancestry of small 
samples, an approach that for neutral models is 
equally powerful, but mathematically quite sim- 
ple. Coalescents for models with natural selec- 
tion, however, remain an active area of research! 

We'll use a coalescent approach to prove (2). Re- 
call that we started with a sample of n individuals, 
which contains representatives of k allelic types, 
such that there are n { individuals with allele A ? , 
i=l,...,k. As we look backwards in time, two types 
of events can occur: two lines with the same allelic 
type can merge into a common ancestor (i.e., one 
is the parent of the other and gave birth at that 
moment), or an allele that occurs exactly once dis- 
appears - when a mutation occurs, we get exactly 
one representative of the new type, we but loose 
all information about their ancestry (if we only 
have the parallelogram, we have no way of know- 
ing that it was previously a triangle...) Unlike the 
forward-time population dynamics, both types of 
events reduce the number of lines in our sample by 
one, so we only need to trace back at n such events 
to determine the ancestry of our entire sample. 

We can obtain a particularly simple representation 
of our process by ignoring the exact times in the 
past at which events occurred, and only consider 
events that change the genealogy of our sample. 


In such an event, that occurred in the time inter- 
val [t,t+ At), when there were m ancestors (m 2 will 
allele A { , i=l,...,j) to the lines in our sample, either 

1 . one of the m t individuals ancestral to the 
sample gave birth, and replaced an indi- 
vidual not currently in the sample, an event 
with probability 

m i 1 ~ At) At + °( At )i 
or, V NJ 

2. one of the N-m lines not in the sample was 
hit by a mutation, with probability (N-m)[AAt 
+ o(At) = (N-m) 6At/N + o(At), and gave rise 
to a new allelic type that is in our sample. 

If we condition on an event happening in [t,t+ At), 
it is of the hrst type with probability 

rm^At + o(At) 
t + miLr At + °( At ) ’ 

and of the second type with probability 

O^At + o^At) 

O^At + Eti m^At + o(At )' 

Dividing the numerators and denominators by 
(N-m)At/N and passing to the limit as At -+ 0, we 
arrive at a discrete time Markov chain in which, 
when we have m lines such that m { carry allele A { , 
the number of lines increase by one with prob- 
abilities ___ n 

and tA- (5) 

for mergers and mutations, respectively. 

With this reduced Markov chain, the proof of Ew- 
ens' sampling formula is quite easy: 

Proof [of ( 2 )]: If our sample contains n { individu- 
als of type i=l,...,k, we must have hrst had the 
hrst individual of that type appear, with prob- 
ability proportional to 6, and then, that individ- 
ual gave birth with probability proportional to 
1, then, one of the two individuals with allele A { 
gave birth with probability proportional to 2, etc. 
Continuing inductively, we arrive at n { individu- 
als when one of the n { - 1 individuals with allele 
A f gives birth. That gives us the numerator of ( 2 ), 
k 

IP(* “ 

i— 1 
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To complete the proof, we note that inde- 
pendent of the type of event, when there are 
m individuals ancestral to the sample, the de- 
nominators of the probabilities in (5) are all 
(6+m). As we go from having no individu- 
als to having n-1 individuals (the last event 
we record is the arrival of the n th line in 
our sample), the denominator is always 
6(6+1)...(6+n-l) y which gives us (2). □ 

In [1], David Aldous gave an amusing interpreta- 
tion of the simplihed Markov chain given by (5), 
which he called the Chinese Restaurant Process: 
Aldous imagined an idealised Chinese restaurant, 
with inhnitely many tables, each with an unlim- 
ited supply of chairs, and a never-ending supply of 
hungry diners who arrive, but never leave. When 
the diners arrive, they either choose to sit at one 
of the currently occupied tables, with a probabil- 
ity proportional to the number of people already 
seated ("the more, the merrier"), or with probabil- 
ity 0, they start a new group by sitting at a cur- 
rently unoccupied table. By contrast, we could 
imagine an English Restaurant Process, where 
everybody entering sits down at a vacant table, 
but that would be much less interesting. 

...and beyond 

A lthough we've looked at Ewens' sampling 
formula in the context of population ge- 
netics, it's use extends far beyond. Indeed, 
(2) was derived independently, in the context of 
abstract nonparametric Bayesian mixture models 
by Charles Antoniak in [2]. The recent rediscov- 
ery of Antoniaks paper by people working in the 
held of machine learning probably affects you on 
a daily basis — if you do keyword searches on the 
internet - via an algorithm known as latent Dir- 
ichlet allocation. When you do a keyword search, 
youre usually lookingfor articles on agiven topic. 
We choose keywords because we expect them to 
occur with high probability within documents on 
the desired topic. The power of latent Dirichlet 
analysis is that it's often possible to identify dis- 
tinct topics without knowing what the topic is, 
nor knowing the meaning of the keywords, some- 
thing that is increasingly valuable as the number of 
pages of text available online grow far faster than 


humans can read them and assign them to topics! 

To understand how this works, lets make an 
analogy with genetics. We'll imagine that all the 
words in each topic is a distinct population, and 
each keyword is an allele. The relative abun- 
dance of the keywords is what dehnes a topic (e.g. 
an article about genetics is much more likely to 
contain words like "allele", "nucleotide", "protein", 
or " Drosophila ", and in higher abundance than 
an article on maths, which will have words like 
"lemma", "theorem", and "proof". Articles like this 
one, on mathematical population genetics might 
confuse things a little, but thats a problem easily 
solved by letting them dehne a new topic). When 
we take a document from a topic, we get a sam- 
ple of alleles from that topic. Much as we could 
use Ewens sampling formula, together with small 
samples of individuals, to reconstruct the popula- 
tion abundances, we can use it, along with sam- 
ple texts, to reconstruct the abundances of key- 
words that dehne the topic, and even determine 
the most probable assignment of a text to topics 
— for example, this article might be assigned 60% 
genetics, and 40% maths, or vice versa - without 
ever having thought about the content at all! See 
[3] for a readable survey of such applications. 

In addition to further rehnements and extensions 
of the sampling formula for applications in genet- 
ics and machine learning, Ewens' work has also 
inspired a growing subheld that investigates ran- 
dom partitions and other objects at the interface 
of probability and combinatorics, a wonderful 
demonstration of the fruitful interaction between 
mathematics and applications, in which the ben- 
ehts run in both directions. A comprehensive list 
of references is impossible, but an excellent point 
of departure is [14]. Perhaps the reader will be in- 
spired to make their own contribution! 

About theauthor 

Dr. Todd L. Parsons is a permanent researcher 
with the Centre national de la recherche scienti- 
fique (CNRS) in France, who divides his time be- 
tween the Laboratoire de Probabilitĕs et Modĕles 
Alĕatoires at Universitĕ Pierre Marie Curie (Paris 
6 ), where he belongs to the Ĕquipe Probabilitĕs, 
Statistiques & Biologie, and the Centre for Inter- 


43 


disciplinary Biology at the Collĕge de France, 
where he belongs to the Stochastic Models for the 
Inference of Life Evolution (SMILE) team. Todd 
gives his deepest gratitude to Prof. Warren J. Ew- 
ens, whose inspiration and mentorship helped 
him dehne his own research career. 

References 

[1] D. J. Aldous. Exchangeability and related 
topics. In P.L. Hennequin, editor, Ĕcole dĔtĕ de 
Probabilitĕs de Saint-Flour, XIII-1983, volume 
1117 of Lecture Notes in Mathematics, pages 1-198. 
Springer Berlin Heidelberg, 1985. 

[2] C. E. Antoniak. Mixtures of Dirichlet process- 
es with applications to Bayesian nonparametric 
problems. Ann. Stat., pages 1152-1174, 1974. 

[3] D. M. Blei. Probabilistic topic models. Com- 
mun. ACM., 55(4): 77-84, 2012. 

[4] R. Durrett. Probability Models for DNA Se- 
quence Eyolution. Springer, New York, 2nd edi- 
tion, 2009. 

[5] A. M. Etheridge. Some mathematical models 
from population genetics. In J. Picard, editor, Ĕcole 
dĔtĕ de Probabilitĕs de Saint-Flour, XXXIX-2009, 
volume 2012 of Lecture Notes in Mathematics, 
pages 1-119, Berlin Heidelberg, 2011. Springer. 


[6] W. J. Ewens. The sampling theory of selectively 
neutral alleles. Theor. Popul. Biol., 3:87-112, 1972. 

[7] W. J. Ewens. Mathematical Population Genetics. 
Springer, Berlin, 1979. 

[8] J. B. S. Haldane. The cost of natural selection. / 
Genetics, 55(3): 511-524, 1957. 

[9] M. Kimura. Evolutionary rate at the molecular 
level. Nature, 217(5129): 624-6, 1968. 

[10] J. F. C. Kingman. The coalesent. Stoch. Proc. 
Appl, 13:235-248, 1982. 

[11] J. F. C. Kingman. Exchangeability and the 
Evolution of Large Populations. In G. Koch and F. 
Spizzichino, editors, Exchangeability in Probabil- 
ity and Statistics, pages 97-112. North- Holland, 
Amsterdam, 1982. 

[12] J. F. C. Kingman. On the genealogy of large 
populations. / Appl. Prob., 19A: 27-43, 1982. 

[13] B. Lewin. Genes V. Oxford University Press 
Oxford, 1994. 

[14] J. Pitman. Combinatorial stochastic process- 
es. In J. Picard, editor, Ĕcole dĔtĕ de Probabilitĕs 
de Saint-Flour, XXXII-2002, volume 1875, pages 
1 - 256, Berlin Heidelberg, 2006. Springer. 

[15] G. A. Watterson. Heterosis or neutrality? Ge- 
netics, 85: 789-814, 1977. 

[16] G. A. Watterson. The homozygosity test of 
neutrality. Genetics, 88(2): 405-417, 1978. 


Appendix 

To show that k ~ 9\n n in the above, our aim is to bound the sum by an integral which we know how to 
evaluate. Noting that 


whereas 


and 


r +1 d , & r 

/ -- dx < > -- < 1 + / 

Ji 0 + x — 1 6 + i — 1 J i 

rn +1 n , 

I -- dx = 0 Mn($ + n) — ln 

J i 0 I x 1 


0 + x — 1 


dx , 


|—-—- dx = 6 ^ln($ + n — 1) 


- 1) - ln 


we get the required bounds. 
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The solutions to cubic and quartic equations 
were found by Cardano and Ferrari respectively 
in the sixteenth century. Cardano solved the de- 
pressed cubic equation, X 3 = ax + b, by splitting 
the variable x as, x = u + v, and then expanding 
the resulting expression; while Ferrari solved the 
depressed quartic equation, x 4 + ax 2 + bx + c = 0, 
by rearranging the terms in the quartic equation 
on either side of equality sign and adding some 
terms on both sides, such that each side becomes 
a perfect square. Since these methods are well 
known, we dont discuss them further and readers 
are advised to see the literature [1]. 


Taking square root of (5) and rearranging the 
terms results in the following two quadratic equa- 
tions. 

u 2 + [2v + V-(2v 2 + a)]u + V~( 2 ^ 2 + <9 2(^2 +o) & ] =0 (6) 

Tbe two equations in (6) contain =F signs at two 
places; the first equation has - sign at both plac- 
es, and the second one has + sign at both places. 
Solving these quadratic equations, we obtain four 
values of u for each value of v. Four solutions of 
quartic equation (1) are then obtained using x = 
u + v. 


However we are curious to know whether the 
quartic equation can be solved in a manner Car- 
dano solved the cubic, i.e., by splitting the variable 
x as, x = u + v, and then expanding the resulting 
expression. Let us make an attempt here. 

Consider the depressed quartic equation, 

x 4 + ax 2 + bx + c = 0 (1) 

where a, b , and c are coefficients in (1). Substitut- 
ing x = a + vin(l) and expanding it we obtain, 

u A + 4 u 3 v + 6u 2 v 2 + 4 uv 3 + v A + au 2 
+ 2 auv + av 2 + bu + bv + c = 0 

Notice that the above equation can be rearranged 
such that the left-hand-side is made a perfect 
square as, 


(u 2 + 2uv) 2 — —(2v 2 + a) (u 2 + 


4v 3 + 2 av + b v 4 + av 2 + bv + c 


2vM 


2v 2 H 


( 2 ) 


Tbe quadratic term in u in the right-hand-side of 
(2) also can be made a perfect square as: 

4v 3 + 2 av + b~\ 2 

ii -+ - 

2{2v 2 + a) 


if the condition, 

(4v 3 + 2 av + b) 2 
4(2v 2 + a) 


(y A + av 2 + bv + c) = 0 


(3) 


is satished. Further simplification of this condi- 
tion (3) leads to the following cubic equation in V 2 
as, 

v 6 + (a/2)v 4 - cv 2 + [(b 2 - 4ac) /8] = 0 (4) 


The cubic (4) in V 2 is known as resolvent cubic 
equation, and solving it results in three values of 
v 2 or six values of v. Now equation (2) becomes a 
perfect square as, 


(u 2 + 2 uv) 2 = —(2v 2 + a)[u + 


4v 3 + 2 av + b 
2(2+ + a) 


] 2 (5) 


Let us solve one numerical example. Consider the 
following depressed quartic equation. 

x 4 + 3x 2 — 6x + 10 = 0 

The resolvent cubic (4) in v 2 is obtained as, 
v 6 + 1.5v 4 - 10v 2 - 10.5 = 0; 
and after solving it, we get three values of V 2 as, 3, 
-1, -3.5, and six values of v as, ±1.732050807568, 
±/, ±1.870828693387, where i 2 = - 1. Choosing v = 
i y the two quadratic equations in (6) are obtained 
as: 


u 2 + iu + 1 + 3i = 0; u 2 + 3 iu — 1 — 3i = 0. 

Solving above equations, we get four values of u 
as: 1 -2/, -1+ i, 1, and -1 -3 i. Using the relation, 
x = u + v, we obtain four solutions of the quartic 
equation as: 1 - i, -1 + 2 i, 1 + i, and -1 - 2 i. No- 
tice that one can choose any of the six values of v 
to solve the quartic equation. Interested readers 
may verify by using other values of v to obtain the 
solutions. 
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0. Introduction 

Sequences are important in many branches of 
mathematics. The sequence of partial sums, for 
example, determines the convergence of an 
inhnite series. Analysis courses abound with 
sequences of functions and the various ways in 
which they converge. Number theorists study the 
Fibonacci and Lucas sequences among many 
others. While the recursive dehnition of the 
Fibonacci sequence involves two prior terms to 
dehne the n -th term, you will see that a slightly 
altered version only requires one prior term - an 
esthetically pleasing and surprising fact! Guest 
appearances by n and e in the article will be ap- 
preciated by lover of mathematics at all levels. 


Letting f(n) = x n x n+1 , (3) becomes f(n+1) =f(n) 
+ 1. Since f(l) = x t x 2 = 1, we hnd that f(n) = n , 
implying that x n x n+1 = n. It follows that 

Xn +1 = ( 4 ) 

In other words, the sequence dehned by (1) has a 
simpler recursive relation (4) in which each 
term depends solely on the preceding term. Note 
that the hrst ten terms satisfy the new recursion. 
We use (4) to write another recursive formula 
from which the sequence may be obtained 
(with the seed x x =x 2 = 1). We have, using (4) 


n + 1 

%n+l 


n + 1 


> or 


^n+2 


n + 1 


( 5 ) 


In this article, we present several recursive- 
ly dehned sequences and obtain interesting 
results that involve relatively simple tools such 
as limits and asymptotic behavior. Students 
encounter asymptotic behavior in a hrst year cal- 
culus course when they take ratios of dominant 
terms to determine the limit of a ratio- 
nal function when x approaches inhnity. 

The only tool in this paper unfamiliar to 
math freshmen is Stirlings Approximation 
which can be explained quite readily. 


II. Several Theorems 

As a consequence of (5), we have the following 
theorem. 

Theorem 1: (++) = e 

Proof: By (5), we have (+ 7 ) = + «) > which 

approaches e as n goes to inhnity. ■ 


If any readers sponsor undergraduate theses, 
this article should be a source of interesting 
problems for future projects with students. 

I. Dehnition 


Let the sequence {x n } be dehned by the seed 
x x =x 2 = 1, and the recursive relation 

1 

%n +2 %n “t - 

%n+l 

The hrst ten terms are 


( 1 ) 


t 2 3 8 15 48 105 384 945 
5 lj 1 ’ 2’ 3 ’ ¥’ 15 ’ " 48 "’ 105 ’ 384 (2) 

For reasons that will be apparent later, we do not 
reduce the fractions in (2) to lowest terms. Now 
multiplying both sides of (1) by x n+1 yields 

Xn+lX n +2 = X n X n +l + 1 ( 3 ) 


We now hnd the closed form for the sequence, 
using (4). The two seed terms will be 
omitted, but they follow the pattern to be estab- 
lished. 









2 _ (1!) 2 2 2 
1 2! 

1-3 _ 1-2-3 _ 3! 

~ 2 2 _ ( 1 !) 2 2 2 
2 • 4 _ (2 • 4) 2 _ (2!) 2 2 4 

L3 _ 1 • 2 • 3 • 4 “ 4! 

1- 3-5 _ 1 • 2 • 3 • 4 • 5 _ 5! 

' 2-4 _ (2 • 4) 2 ~ (2!) 2 2 4 
2 • 4 • 6 _ (2 • 4 • 6) 2 _ (3!) 2 2 6 

1-3-5 “ 1 • 2 • 3 • 4 • 5 • 6 _ 6! 

1 • 3 • 5 • 7 _ l-2-3-4-5-6-7_ 7! 

2- 4-6 _ (2 • 4 • 6) 2 _ (3!) 2 2 6 

2 • 4 • 6 • 8 _ (2 • 4 • 6 • 8) 2 _ (4!) 2 2 8 

1•3•5•7 _ 1-2-3-4-5-6-7-8 “ 8! 


Table 1 

Clearly, the closed form of x n will depend on the 
parity of n. 
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Case 1: n = 2k + 1. Then x 2k+ i = 


(£;!) 2 2 2fc 


Case 2: n = 2k. Then = 


(2k - 1)! 
[(k — 1) !] 2 2 2fe — 


( 6 ) 

(7) 


The reader is reminded that two functions f(x) 

y = y 

and g(x) are called asymptotic if x—>oog(x') 

This does not imply that \f(x) - g(x) | is bounded, 
as can be seen by the pair of asymptotic functions 
f(x) = x 2 +x and g(x) = x 2 whose absolute dif- 
ference goes to inhnity. We write f(x) ~ g(x) to 
denote that f and g are asymptotic. The reader 
is also reminded of Stirlings beautiful relation [1] 


- Vtok 


( 8 ) 


which we will use to obtain asymptotic approxi- 
mations for x 2k and x 2k+1 . We require the easily 
verified asymptotic relations (9) and (10). 


m 2 


2nk 


(9) 


(2fc)!' 


f) 2^ 


( 10 ) 


Using (9) and (10) have 


■^2fc+l — 


(fe!) 2 2 2fc (k/e) 2k 27rk2 2k _ r- 

(2 k)\ (2k/e) 2k 2Wk (H) 


Corollary 1: 


lim 


^2/e+l 


a^oo X 2k 


7T 

2 


Proof: By Theorem 2, we have 

' X 2 k 


Corollary2: lim^^ 

# X^OOX 2 fc+l 


2 

7T 



Proof: x 2k ~ 2 ^ ^ implies, upon replacing k by 


k + 1, that %2k+2 



X 2k+ 2 

x 2k+l 



2 jk + 1 

7T V k 


2 

7T 


As a consequence of Corollaries 1 and 2, note that 
“Sr - fails to exist. Moreover, by these corol- 
laries, there exists a positive integer, fC, such that 
for all k > K, one has x 2k+1 > x 2h while x 2k+2 < x 2k+1 
. In other words, from some point on, terms with 
odd index are greater than the terms immediately 
before them, while the reverse is true for terms 
with even index. Note that this strict alternation 
between increasing and decreasing behavior ap- 
pears to be the case for the terms in (2) with the 
exception of the equality of the first two terms. 


Obtaining an asymptotic approximation for x 2k 
will be more difficult. 

(2k — 1)! (^ 1 ) 2k ~W2n(2k - 1) _ 1 2k - 1 2fc _ x /+/k 

X2k ~ [(k - l)!] 2 2 2fc - 2 ~ (t+yk-22n(k - l) 2fc - 2 ~ e' k - 1 2 2fc -% 

= 12k-± )2k _ 1 2Vk = 12 k-1 1 , 2fc _ 2 2 Vk ~ Jk 

e K 2k-T y/+ e 2fe-2 2k-2 ] y/+ V tt 

The last asymptotic approximation made use of 

the fact that (i + ^) = * 


The next theorem requires the following Lemma 
which we state without proof. 

Lemma 1:Let a and b be positive integers, and 
let p be a prime that divides b but does not 
divide a. Tben a/b is not an integer. 

Tbe reader is reminded that Bertrands Postulate 
[2] says that for n >1, there is at least one prime 
p such that n < p < 2n. 


On the basis of these calculations, we have the fol- 
lowing theorem. 

Theorem 2: X2k ~ 2 A and x 2k+1 ~ V7k 

V 7T 

Letting n = 2k in (4), we have x 2k x 2k+1 =2k, 
which is consistent with the above theorem. 
Furthermore, Theorem 2 implies that } l V/ Xn = 00 . 
The following corollaries of Theorem 2 will 
be useful. 


Theorem 3: Let n > 3. Then x n is not an integer. 

Proof: We have two cases depending on the par- 
ityof n. 

Case 1: n =2k +1, where k > 1. Then 

by (6), x 2k+1 = 2 , . By Bertrands Postulate, 

(2 k)\ 

there exists a prime p such that k < p < 2k. 
Then p divides that denominator, but not the nu- 
merator, of x 2k+1 . Then by Lemma 1, x 2k+1 is not 
an integer. 
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Case 2: n = 2k, where k > 1 . , , 

1 * O • • * (^rC 1J 

From Table 1 , one has x u = 2 . 4 ... ( 2 A; — 2 ) • T * 1611 
2 divides the denominator, but not the numerator, 
of x 2k . Then by Lemma 1, x 2k is not an integer, 
and the theorem is proven. ■ 


Telescoping the sum on the right gives x m + x m+1 
- 1. Thus we have proven the following theorem. 

Theorem 6: ~ = x m + x m+1 -1 ■ 


Let P n = from which X n+l(Pn ) 2 - 

x 1 x 1 x 2 x 2 x 3 x 3 —x ri x ri x n+1 . Recall that (4) can be writ- 
ten as x n x n+1 = n. Then we have X n+l(PJ 2 ~ n •• 

We have proven the following theorem 


Theorem4: Pn 



( 11 ) 


The equations of Table 1 , which were obtained 
from (4), imply that 


x 2k 


x 2k+l 


N 2k _ 1 

-3- 

• • (2k - 1) 


^2k 2 

•4- 

■■(2k-2) 

(12) 

N‘2k+1 


2 • 4 • • ■ (2 k) 


D 2k +i 

' 1 

• 3 • • • (2k — 1) 

(13) 


7 + 1 

Corollary3: z^~diverges. 


Proof: Let m go to inhnity in Theorem 6 . Then 
use Um x n = 00 which was established using 
Theorem 2. ■ 

III. A related sequence 

Let a new sequence {y n } be dehned by the seedy^ 
y 2 = 1 , and the recursive relation 

y n +2 = y n +i ^— (15) 

2/n 

7 23 173 

The hrst few terms are 1 , 1 , 2 ,3, 


In light of (12) and (13), we turn our attention to 
the sequence of numerators {N n }> whose hrst few 
terms are 1, 1, 2, 3, 8, 15, 48, 105, 384, 945. 
Since x n = N n /D n and x n+1 = n/x ny we have N n+1 / 
D n+1 = nDJN n . As we are not reducing these frac- 
tions, it follows that 

N n +1 — tlD u and D n +i N n (14) 

Note that the terms 8, 15, 48, 105 and 384 
of the sequence {N n } can be rewritten 1(3 2 - 
1), 1(4 2 - 1), 2(5 2 - 1), 3(6 2 - 1) and 8(7 2 - 1), where 
the factors before the parentheses are consec- 
utive members of the same sequence. This 
is not a coincidence. By (14), N n+2 = (n+1) 
D n +i= ( n +VN n , implying that N n+4 = (n+3)N n+2 
= (n + 3)(n + l)N=[(n + 2) 2 - 1 ]N n . 

We have proven the following theorem. 

Theorem 5: N n +4 = [(n + 2) 2 - l]iV n> . 


Question 1 : Find a recursive relation of the form 
y n +i = g(y n , n) as was the case for {x n } in section I, 
that is, a relation such as (4). 

By (14), one sees that {y n } is strictly increasing, 
unlike the sequence {x n }. As a consequence, we 
have the following theorem. 

Theorem 7: ~ 00 . 

Proof: Since {y n } is strictly increasing, it must ei- 
ther approach inhnity or have a hnite limit, say L. 
We show the latter is impossible by contradic- 
tion. Assume that = L < 00 . Then taking 
the limit of both sides of (14) as n goes to inhnity 
yields the absurd equation L = L + 1 /L. and we 
are done. ■ 


By the second equation in (13), a similar identity 
holds for the sequence, {DJ, of denominators. 


00 ^ 

We turn our attention to the sequence J~ n 
. Replacing n by n- 1 in (1) yields x n+1 n = x nA 
+ l/x n in which case l/x n = x n+1 - x nA . Dehning 
x 0 = 0 enables us to sum both sides of this last 
equation from n = 1 to n = m. 


Note that l/x^ =x 2 = 1. We obtain 




The next theorem yields a result similar to The- 
orem 6 . 

Theorem 8: S + n = ^™+ 2 “ 1 


Proof: By (14), l/y n =y n+2 -y„ + i- Then 

J2 — = L ^"+ 2 - »»+ i ) 
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which telescopes down toy m+2 -l. ■ 


oo ^ 

Corollary4: X]^diverges. 


IV. Another sequence whose 
recursive dehnition can be 
simplihed 

Let the sequence {x n } be dehned by the seed 
x i= x 2 = 1 , and the recursive relation 

x n +2 = x n+1 + 2 x n (15) 

which is a variant on the recursive relation of the 
Fibonacci sequence. The hrst few terms are 


Proof: We use a variant of induction. The the- 
orem is clearly true for n = 1 and n- 2 . Now 
assume it is true for n-k and n - k + 1. Then 
we have x k+I = 2x k + (-l) k and x k+2 = 2x k+1 + (-l) k+I . 


Using (15) and the preceding two equations, 
we have x k+3 = x k+2 + 2x k+I 
= 2x k+1 + (-l) k+I + 2[2x k + (-l) k [ 

= 2(x k+I + 2x0 + l(-V k+I + 2(-l) k ] 

= 2x k+2 + (-l) k [-l+2] = 2x k+2 + (-l) k 
= 2x k+2 + (~l) k+2 . ■ 


{1,1,3,5,11,21,43,85,...} (i 6 ) 

We have the following interesting theorem which, 
by the way, is the reason the above sequence 
is included in this paper. 

Theorem 9: The sequence dehned by (15) and 
the seed x^= x 2 = 1 satishes the recursion 
x„ +I = 2x n + (-!)". 
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